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THE STATISTICAL 


PSYCHOLOGICAL 


POWER OF ABNORMAL 


SOCTAL 
RESEARCH: 


A REVIEW 


JACOB COHEN 


York 


Given an experimental effect in a popula- 
tion, how likely is the null hypothesis to be 
rejected? Equivalently, what is the power of 
the statistical test? What is the expectation 
that the (false) null hypothesis will be sus- 
tained and thus a Type II error committed? 

It is a remarkable phenomenon that the 
research which is reported by psychological 
investigators rarely refers to this issue, and 
more rarely actually investigates it 
On the other hand, issues concerning Type I 
error or “significance,” i.e., the validity of 
the rejection of the null hypothesis, are more 
or attended to. This 
marked asymmetry of sophistication and at- 
tention to these two types of error is mirrored, 
and largely determined, by the exposition of 
these issues in the statistics textbooks used in 
the graduate training of the investigators. 
These texts are characterized by an early ex- 
planation of Type I and Type II errors, 
followed by a neglect of the latter throughout 
the remainder of the text. Thus, every statis- 
tical test is described with careful attention to 
issues of significance, and typically no at- 
tention to power. (For a partial exception, 
see Walker & Lev, 1953.) 

The problem of power is occasionally ap- 
with the 
investigation 


even 


less conscientiously 


proached indirectly by concern 
sample size to be used in an 
Other things equal, power is a 


but 


monotonic 


function of sample size, decisions as to 


sample size are typically reached by recourse 


to local tradition, ready availability of data, 
unaided intuition, usually called “experience,” 
and negotiation (the latter usually between 
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Universi 


doctoral candidate and sponsor, or author and 
editor) and rarely on the basis of a Type II 
error analysis, which can always be performed 
prior to the collection of data. These non- 
rational bases for setting sample size must 
often result in investigations being undertaken 
which have little chance of success despite the 
actual falsity of the null hypothesis, and 
probably less often in the use of a far larger 
sample than is necessary. Either of these 
circumstances is wasteful of research effort. 

Stemming from these considerations, a pro- 
gram of investigation, computation, and re- 
portage has been undertaken whose major 
aims are as follows: 

1. To call these issues to the attention of in- 
vestigators, consumers of research, and eval- 
uators of research planned or completed 
(sponsors, agency panels, journal editors). 

2. To provide tables and conventional 
standards which will facilitate the perform- 
ance of power analyses for the most common 
statistical tests. 

3. To conduct surveys of the psychological 
research literature to assess its current status 
with regard to power. 

The present report is the first of the in- 
vestigation, and seeks to achieve partially the 
first and third aim.* Specifically, it describes 
the results of a survey, of the Journal of Ab- 
normal and Social Psychology, 1960, 61, from 
the viewpoint of the power of the statistical 
tests employed. Less formally, it seeks to 
answer the question, “What kind of chance 
did these investigators have of rejecting false 
null hypotheses?” 


METHOD 


formulae 
the most 


The statistical literature was searched for 
and nomographs of power functions of 


statistical 
tables, is 
publication 


the 
pow er 


desc ription otf 
the 


preparation 


detailed 
well as 


2A more 
rationale, as 
presently in 


resulting 


lor separate 
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statistical which 


relating 


commonly used 
were prepared power to the 
which it is a function of 
of departure in the populati 
pothesis), (>) significance (Type I « 
(c) sample size, I 

directional or 
sources of the formulae or 


tests, Irom 


(degree 


null hy 


effect 


on from the 


and 1 choice ol 
(Le., 
the 
given in Table 1.5 


nondirectiona 


Standard Conditions 


For the purpose of the survey, it 
to formulate a set of 
on the basis of which the power of each test 
be computed. Whether or not other lificanc 
criteria indicated, the .05 Type I error 
was used uniformly. Further, whether or not 
wise specified, the nondirectional 
null hypothesis was used throughout 
two-sided test for normal, binomial, and 
tions, and the logically 
value test for x* and F distributions, 
usually tabled and used in 
though this criterion 


was necessary 
reasonable standard conditions 


could 


level 
other 


were 


version of the 


This 
; ] ‘ ) 
equivaient on 


hypoth 


may lead t 


See Footnote 
‘With few 
evidence of a 


exceptions, th provided no 
level being set 


significance 
either 


prior to 
not deemed 
worth mentioning, or none had indeed been set 
In any case, and rightly or ] 
has trickled down from 
standard, and is usually 


data collection, because it was 
level 
a conventional 
understood to be the 


other is mentioned 


wrongly, the 5 


agronomy as 


significance criterion if no 
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“How 


coefficient 


n t 
mos 


experimental 
the question 
correlation 


expect actual exists, 


Only rar 


answere 
further 
nd 


pend 
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diverse 
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statistical 

survey 

basis for 

in the hope o 

analyses as 

a solution 

provisionally, 

criteria in sucl 
ee al 


iorm 


the 

were 

usual z sco! 
null 
tions re 


hypothet 


f the 
These values 
but were ch 


meter 
trary, 
The 


reasonableness fr¢ 


reader can 


ensuing discussion, 

ever his judgment, 1) 

accept them as conventional 
Discussion is iry to amplify, 

perhaps justify the decisions sum 


7 


ind 
Table 
for each type of statis l te f the null hy 
pothesis in turn 


1. t test for two means. Consider the mediun 


necess 


exemplif 
marized in 


a difference between 
half of the 


level: it posits the existence of 
population means amounting to one 
population sigma. In more 
this would be exemplified by a research plan that 
sought to detect a difference of 8 points between the 
mean IQs of 

rf IO mean differences would 


generally familiar terms, 


and 
and 


two populations. Similarly, small 


amount to 4 
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rABLE 2 


PARAMETERS WuHicn DEFINE THE 
Various STA 


to that of two proportions exists here: a given 

population difference between two Pearson cor 

i relation coefficients is of varying detectability as a 
people in function of their level, even more so than in the 
and case of proportions, e.g., with two samples of 50 cases 


. t 


Super each, the power under our standard conditions to 
small) detect a difference between population r’s of .1 
mean IQ and is .17, while for .70 and .90 it is 83. An 
Husén cactly parallel solution was used, ie., the sampk 
would be values were used to approximate the level of popula 
tion correlation of the test. Again, the problem was 
uvoidable by using the difference in Fisher z trans 
formation values to define size of effect these are 
invariant for level of population r’s—but again 
considerations of awkwardness and unfamiliarity 

led to the rejection of this alternative 
The argument of perceptibility for the definition 
of a correlation difference of .2 as medium (Table 
is not uniformly convincing. At high and pos 
bly at moderate levels of correlation, such a popu 
lation difference would be noticeable, but not, say 
when the population r’s were .10 and .30. This 
difficulty, too, could have been avoided by defini 
tion via differences in Fisher 2 transformation values 


the lev ‘ In any case, this decision could not affect the re 


at which the power sults of the survey, since only a few minor instances 


the average of the f this statistical test were encountered. Small and 
yrocedure urge ffects were again symmetrically defined as 

and 3 

0. There were no technical con 


) 


plications here, but the choice of “reasonable” valu 

defining the levels of size of effect proved troubk 

yme. Initially, for the sake of comparability, it 

planned to us¢ values of the r which 

re implied by those selected for the ¢ test between 

I ins, since any difference between (standardized) 

difference n an be expressed as a (point biserial) cor 

medium, relation coefficient, or vice versa. This led (on th 

a assumption of populations of equal size,® to values 

(medium) of .125, .25, and .50 for the respective levels of size 

phenomenon, and the of effect. On the ‘ lly untenable further as 
’ 


illy about it 
An analogous problem Assumptions of in y lead to smaller values 
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sumption that the populations result from the 
dichotomization of a normally distributed variable, 
the resulting biserial r values are larger, 
.16, 31, and .63. Thus, from this point of view we 
have as candidates for the definition of a medium 
effect, coefficients of .25 or (more questionably) 31 

On the other hand, conventional verbal descriptions 
would consider a “small 
and .40, a “moderate” 


somewhat 


correlation, one between .2 
one between .40 and 70, and 
a “high” one between .70 and .90 (Guilford, 1956, p 
145). Guilford points out that these verbal terms 
may be misleading, and points out that “the validity 
coefficient for a single test may be expected in the 
range from .00 to .60, with most of them in the lower 
half of that range” (p. 146) 

Thus, the medium correlation defined for com 
patability with the criterion for a ¢ test between 
means would be about .25-.30, in conventional 
abstract terms about .50-.60, and in specific ap- 
plication as test validity coefficients, perhaps about 
30-.40. A compromise among these considerations 
was struck: a medium effect size was defined as .40, 
with small and large effects, respectively, as .20 and 
60 These are smaller than would be dictated by the 
abstract conventions, but rather more generous (i.e., 
give higher power estimates) than the criteria of 
the other statistical tests, and are reasonably in 
keeping with at least one common application of 
correlation, validity coefficients 

5. Sign test. This is more generally a test of the 
hypothesis that the population proportion having 
a given characteristic equals 5, and is accomplished 
by reference to the binomial distribution for small 
samples and the normal distribution for m> 25, 
where it gives an adequate approximation to the 
binomial. The same criteria were used for levels 
of size of effect here as were used for the hypothesis 
that two population proportions differ, ie., .10, .20, 
and .30 for small, medium, and large effects, re- 
spectively (Table 2), and on the basis of the same 
considerations 

6. F test for k means. The population parameter 
f (Table 2) used to define degree of departure from 
the null hypothesis was the standard deviation of 
the & standardized population means, ie., of the 
means expressed in units of the common population 
sigma, or as z scores. For the ¢ test for two means, 
the parameter was the absolute difference between 
the two means so expressed; here, for & means, it 
is their standard deviation which measures their 
departure from each other, and therefore from the 
null hypothesis which holds them to be equal. 

Since ¢ is merely a special case of F (ie., its 
square root when the numerator has 1 df) it was 
possible to define the levels of the parameter f to 
make them consistent with those of t. Expressed 
in terms of f, the ¢ criteria are, respectively, .125, 

, and .50 (Table 2). Taking the medium level, 
.25, we illustrate for varying numbers of samples, 
the population means implied. For this illustration 
the means are equally spaced® and are expressed 


are more than two means, specifying 
standard mean values. To 


® When there 
f does not fix the 


JacoB COHEN 


in both st 


300 104.9 + 335 105.4 
0 100.0 + 112 101.8 
306 95.1 112 98.2 

335 94.6 


difference between 
level of 
standard 
neans 1S 


Note 
means is .50, as defined for the 
the ¢ test for means. Note also 
deviation of column of standardized 
25, and of IQ means (.25) (16) =4 

The illustration guide as to the 
size of the disparities between means defined as a 
medium effect. Small arrived at by 
halving the gaps between effects by 
doubling them. 

7a. x? that k proportions 
test created the greatest problem in the 
of a parameter to define degree of departure from 
the null hypothesis. A plan to follow the same 
procedure as for k means, namely, a fixed standard 
deviation of the population proportions, was frus- 
trated by the fact that bounded 
by zero and one, so that as the number of samples 
increases, with sigma fixed and 1/k decreasing toward 
zero, negative called for. After further 
exploration with other approaches, the problem was 


that for two samples, the 
medium 
that the 


each 
serves as a 


effects are 
means, large 
test are equal. This 
selection 


proportions are 


values are 
finally solved by choosing as a parameter of size of 
effect the ratio of the largest population proportion 
to the smallest, with equal spacing of the & propor- 
tions. So specified, this leads in turn (for any given 
value of &) to the standard departure function used 
with the noncentral x? distribution (Patnaik, 1949, 
and formula for / given in 7b, Table 2) 

Once the parametric function was chosen, specific 
(Table 
level, a 


values were then selected to define the levels 
2). Focusing again on the critical medium 
ratio of 2:1 leads to the following specification of 
population departures from the null hypothesis of 
equiproportionality for some illustrative values of k: 


267 

233 

200 

167 

133 
It should be noted that by this criterion as k 
increases, other things equal, the departure function 
i and hence the power decreases. Note, too, that 
for k =2, this criterion is mot quite the same as for 
concretize the exposition, the further specification 
of their distribution is necessary, and equal spacing 
is chosen because it leads to maximum separation of 
well as for its intuitive 
power computed, however, is in- 
spacing, but is simply a function 
1957, p. 257) 


the extreme means as 
simplicity. The 
dependent of the 
of f (Dixon & Massey, 
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the statistically equivalent sign test (Table 2), which 
calls for populations of .70 and .30 and hence a 
criterion of 2 1/3:1, instead of the 2:1 value used 
This incompatibility tolerated both because of 
the simplicity of the latter, and also 
cause gave rise to discrepancies be- 
tween for larger values of k which 
seemed intuitively too large to be deemed medium. 
A large effect is defined as a ratio twice as large 
as those illustrated above (ie., 4:1) and a small 
effect as three-quarters as large 3:2), 
since a half as large defines no 
5:5). 

7b. x? contingency test. A definition of 
population contingency which was both simple and 
direct could not be achieved, in contrast to the other 
Instead, the same criteria values | were used 
here as those which resulted from the ratio of 
largest to smallest proportion in the simpler one- 
dimensional x2 test (Table 2, line 7a). Thus, for 
any given number of degrees of freedom, medium 
contingency is implicitly defined as departure from 
null association (as measured by /) equal to that 
of medium departure from the equiproportionality 
hypothesis of 7a, i., a 2:1 ratio of extremes (Table 
2). This results in / values which vary as a function 
of df. What this leads to, as a definition of medium 
size of contingency, is perhaps more clearly illustrated 
than described. Following are some 
tables of varying degrees of freedom 
exemplify medium contingency 


was 
greater be- 
the former 
proportions 


(i.e., 
effect 


one 
ratio (ie., 


size of 


tests 


by examples 
contingency 

whose proportions 
decimal points omitted): 


ORS 
167 


167 
O83 


139 
111 


111 
139 


i= 0617 


O98 152 


OYS 


083 


117 


OS4 
OOS 138 
152 OO8 


067 
133 


100 
100 


117. 133 
O83 067 


Note that in the 2 X & tables above the extreme 
columns’ cells are in 2:1 ratio and the values in 
each row are equally spaced. Of course, other tables 
of equal size of effect (therefore leading to equal 
power) can be constructed, provided that they yield 
the / value appropriate to the df involved 

Limitations of space preclude the presentation 
of tables which exemplify smal! and large con- 
tingency effects, but the interested reader can con- 
struct his own by analogy with the material 
presented 


Survey Procedure 


With tables for .05 level non 
directional tests for varying values of nm for each 
statistical test type, and the size of effect levels 


chosen, the next step was the survey of articles in 


power prepared 
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the volume. Each article was read in turn, and 
the nature of each statistical test performed (or 
implied) in the article was noted. Generally, when 
sample sizes (and for F tests and x?, df) were 
added to the standard conditions, the power of the 
test for small, medium, and large effect size could 
be read directly from the appropriate prepared tables, 
or by interpolation between tabled values. The 
statistical tests given in Table 1 are not inclusive of 
all used in the volume, most noteworthily, non- 
parametric tests based on ranks could not be studied 
from the point of view of power due to unavail- 
ability of systematic studies of this issue in the 
literature. In the relatively few instances where 
such tests had been used, the power was determined 
for the analogous parametric test, eg., the ¢ test 
for means for the Mann-Whitney U test and for 
the Wilcoxon matched-pairs signed-ranks test, and 
the F test for the Kruskal-Wallis H test and for 
the Friedman test (Siegel, 1956). Note that the 
effect of this substitution was to slightly overesti- 
mate the power of the tests on the usual assump- 
tion that the conditions required by the parametric 
tests obtained. Even if this assumption is questioned, 
it is quite unlikely that the substitution results in 
an underestimation of power. In general, in the 
few instances where statistical tests were so described 
leave a doubt about the exact details, the 
doubt was resolved in favor of higher power esti- 
mates. For example, if a group of m cases was 
divided into two subgroups for comparison, but the 
subgroup sizes were not given, it was assumed they 
were equal, which then leads to a maximum power 
estimate for that value of n. 

In this way, the power was determined for the 
4.829 statistical tests’ in the volume. But it was 
desired to characterize the power of each of the 
research studies in the volume. The typical article 
involved a number of tests not all of equal relevance 
to its major hypotheses. To determine an average 
set of power values across all the statistical tests of 
an article might lead to a distorted result, if, for 
example, a few hypothesis relevant tests were per- 
formed on the total data followed by a large num- 
ber of subsidiary exploratory tests on only a por- 
tion of the cases. The latter would be less powerful 
(since sample sizes are smaller), more numerous, and 
less relevant to the issues central to the investigation 
These considerations led to the classification of all 
tests performed either as bearing directly on the 
status of the major hypotheses or experimental is- 
sues, of which there were 2,088, or as being peripheral 
to these issues, an additional 2,741. The latter 
typically included such things as exploratory tests, 
routine tests of the significance of all correlation 


7 Fortunately, this did not demand that many 
separate determinations. In these characteristically 
multivariable studies, a single test, eg. the sig 
nificance of an r for a given m might be applied 
as many as 861 times, ie., to each intercorrelation 
of a 42X42 matrix (to take the most extreme 
example) which 861 statistical tests, but 
requires only a single power determination 


as to 


counts as 
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rABLE 3 


Dist 
wn | 


rRIRI 


FREQUENCY AND CUMULATIVE PERCENTAGI I 
FFEC 


SMALL, MEDIUM, AND LARGE PopuLaTi 


analysis, significance tests 
of reliability coefficients of dependent variables, 
tests of “by-product” control variables unhy- 
pothesized interactions in analysis of variance designs, 


coefficients in a factor 


or 


etc. 

Once the tests were so classified, the mean power 
the major tests was determined at the three 
levels of size of effect for each research study.® 
By this procedure, no matter what the number of 
tests a particular study might involve, all articles 
count equally in the description of the total volume 
The mean power values of the studies at hypothesized 
small, medium, and large population effects were 
then distributed and their central tendency and 
variability determined 


of 


RESULTS 


There are, in all, 78 articles in the Journal 
of Abnormal and Social Psychology, 1960, 
61. Of these, 6 involved no statistical tests 
at all (case reports, factor analytic studies, 
etc.) and two additional articles (both factor 
analytic) involved no major tests above 
defined. The frequency and cumulative per- 
centage distributions and relevant descriptive 
statistics of the (mean) power to detect small, 
medium, and large effects of the remaining 


as 


all 
found 


tests, 
for 


power for 
was also 


8 The less important mean 
both major and peripheral, 
each article. 


T 


Ts 


I 


INS OF THI 


ER NonI 


UNI 


160, 61 


70 articles are given in Table 3. As can be 
from the distributions and their sum- 
marizing statistics, given .05 level nondirec- 
tional statistical tests of the major hypothesis, 
the power to detect the size of effect levels 
previously defined are as follows: 

Small effects. On the average, the studies 
reviewed had only about one chance in five 
or six of detecting small effects. About 
fourth of the articles had much one 
chance in four of yielding significant results, 
and another fourth had n than one 
chance in eight under these conditions. Not 
one of the studies had as much as a 50 
chance of detecting a slight effect! 

Medium effects. When one posits medium 
effects in the population (generally of the 
order of twice as large as small effects) the 
studies average slightly less than a 50-50 
chance of successfully rejecting their major 
null hypotheses. No more than one-quarter of 
these studies have three chances 
in five of succeeding under these conditions 
and another quarter have less than one chance 
in three. 

Large effects. Only when one assumes large 
effects (roughly twice as large as medium) 
does one find typically a good chance of 


seen 


a 


as as 


0) 


more 


50 


as good as 
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rejecting the major null hypotheses, about 
five out of six. Even under these most favor- 
able circumstances, a quarter of the studies 
have than three four of 
succeeding. 

Another way of viewing these results is to 
determine the proportion of the studies which 
would meet the criterion of a Type II error 
level as smal! as the conventional Type I 
level, (power, therefore, would 
be at .95 or higher). None of the studies meet 
this criterion when one posits small or even 
medium effects, 
70) meet it under conditions of large effect 

Incidentally, if the reader questions the 
validity of the author’s judgment in classify- 
ing the statistical tests into major and peri- 
pheral (see above, Survey Procedure) or is 
for any reason curious about the power of 
the researches when all tests, major and 
peripheral, are considered, the power means 
effects are .20, 
different 


chances in 


less 


namely, .05 


and only 23% (ie., 16 of 


for small, medium, and large 
and .83, respectively, hardly 
from the means in Table 3 


50, 


DISCUSSION 


The results indicate that the investigators 
contributing to Volume 61 of the Journal of 
Abnormal and Social Psychology had, on the 
average, a relatively (or even absolutely) 
poor chance of rejecting their major null 
hypotheses, unless the effect they sought was 
large. This surprising discouraging ) 
finding needs some further consideration to 
be seen in full perspective 

First, it may be noted that with few ex- 
ceptions, the 70 studies did have significant 
results. This may then suggest that perhaps 
the definitions of size of effect were too severe, 
or perhaps, accepting the definitions, one 
might seek to conclude that the investigators 


(and 


were operating under circumstances wherein 


large, hence their 
Perhaps, then, research in the ab- 
normal-social area is not as “weak” as the 
above results suggest. But this argument rests 
on the implicit assumption that the research 
which is published is representative of the 
undertaken in this area. It 
that are less likely to 
submit for unsuccessful than 
successful research, to say nothing of a 
similar editorial bias in accepting research 


the effects were actually 


success. 


research seems 


obvious investigators 


publication 


for publication. Consider this paradigm: 100 
investigations are undertaken in which, in 
fact, there is actually a medium population 
effect. From the above findings, about 50 
get positive results and are likely to come to 
publication; the other 50 fail to reject their 
(assumed false) null hypotheses and are un- 
likely to come to publication. Thus, the gen- 
eral success of the articles in the volume 
under review does not successfully argue for 
their antecedent probabilities of success being 
any higher than the results of the analysis 
suggest, or, equivalently, that the criteria for 
size of effect used were overly stringent.® 

On the contrary, there is a line of argument 
that suggests that the criteria were not 
stringent enough. Assume that a medium 
effect exists in the population with regard to 
some psychological construct or constructs, 
e.g., a correlation between two (pure factor) 
attitudes of 40. By the time we have meas- 
ured each, the variance of our scores contains 
error and other construct irrelevant variance 
which serve to attenuate the population ef- 
fect we seek to a correlation of perhaps .20 
to .30 between fallible attitude scores. We al- 
ways must draw inferences from variables 
containing error and irrelevant variance while 
we normally conceptualize our problems in 
terms of constructs. The net effect of the 
fallibility of our measurement and classifica- 
tion is to attenuate the effects we seek. Thus 
the size of effect criteria, relating as they do 
to fallible observations, imply even larger 
construct effects, and from the viewpoint of 
the latter, 

If we then accept the diagnosis of general 
weakness of the studies, what treatment can 
be prescribed? Formally, at least, the answer 
is simple: increase sample sizes. The mean 
of the maximum sample sizes used to test 
major hypotheses in the 70 studies was 68." 
The power of a statistical test depends form- 
ally on several parameters, but unless one is 


are on the generous side 


’The paradigm can be continued: assume that 
it the same time another 100 investigations are 
undertaken in which, in fact, there is no effect, ie., 
the null hypothesis is true. At the .05 level, these will 
contribute, on the average, another five candidates 
for publication. This reduces even further the 
strength of this argument 

10Since the distribution is positively 
evidenced by a standard deviation of 55, 
would be considerably less than 68 


skewed, as 
the median 
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to increase the significance level (i.e., in- 
crease the risk of Type I errors) or use di- 
rectional tests (e.g., a one-sided test for f), 
power can generally be increased only by an 
increase in sample size. Taking 68 cases, it 
is instructive (and chastening) to see how 
much power they provide for various tests 
under standard conditions (.05 significance 
criterion, nondirectional) assuming the exist- 
ence of a medium population effect: 

1. t test for a difference between two means. 
Assuming samples of 34 cases each, the power 
is .52. If the sample was unequally divided, 
say for 50 and 18, power would be only .42." 

2. Normal deviate test for a difference be- 
tween two proportions. With two samples of 
34 cases, assuming extreme population pro- 
portions, say .70 and .90 (or .10 and .30), 
power is .57; assuming population proportions 
of .40 and .60, power is only .38. 

3. Normal deviate test for a difference be- 
tween r’s. Again dividing the 68 cases equally 
for maximum power, with high population 
r’s, say .70 and .90, power is .66; with low 
r’s differing by the same amount, say .10 and 
.30, power is only .13! 

4. t test that a population r = 0. If it is, 
in fact, .40 (medium), 68 cases give the 
high power value of .93. This high power 
is a consequence of the definition of medium 
as .40, rather than a lower value which com- 
patability with the other test criteria would 
dictate (see above, Size of Effect). 

5. Normal deviate test that a population 
proportion is 50 (sign test). If the popula- 
tion proportion is actually .70 (or .30), 68 
cases give the high power value of .92, pro- 
vided that the design yields 68 differences. If, 
however, the 68 cases are set up to yield 34 
matched pairs, is effectively 34, and power 
is only .63. 

6. F test for k means. Power here depends 
upon the number of groups. Assuming three 
groups of 23 cases each, power is .41; with 
four groups of 17 cases each, .36; with seven 
groups of 10 cases each, power drops to .31. 
The F test in the analysis of variance is, 
indeed, a most versatile statistical tool (cf 
Anderson, 1961) but investigators may lose 


11 Tt is demonstrable that for the statistical test 
of any difference between or among samples which 
total m cases, power is at a maximum when the 
m cases are equally divided 
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sight of the fact that, following the discovery 
of a significant F ratio involving several 
groups, they are usually left with a multiple 
comparison problem where the means are no 
more stable than the sample size on which 
each is based. Thus, if in the last example 
(seven groups of 10 cases each) F is found 
significant, the determination of which group 
differs significantly from which then depends 
on means based on 10 cases. Even if one then 
follows the overliberal practice of performing 
t tests between pairs of these means at the 
tabled, but actually higher, .05 level (using 
the within-group error term based on 63 df) 
the power of each test under medium effect 
conditions is only .19, despite the overall F 
test power of .31! 

7a. x* test that k proportions are equal 
This parallels the situation for F tests, power 
varying with &. For three groups of 23, 
power to detect a medium effect is .38; for 
four groups of 17, .30; for seven groups of 
10, .21. The same considerations apply when 
it is necessary to follow-up the overall ,’ 
test. 

7b. x? contingency test. As above, power 
varies with df. For example, for the contin- 
gency tables illustrated above (Size of Effect) 
assuming 68 cases in each, power is as fol- 
lows: df = 2, .50; df = 3, .37; df = 4 (both 
tables) .30. 

Given these generally meager power values 
for 68 cases, it is not surprising to find a mean 


power value assuming medium effect size over 
the 70 articles of only .48. Are these studies 


research 
earlier 


representative of abnormal-social 

undertaken? It follows from our 
reasoning that, if anything, published studies 
are more powerful than those which do not 
reach publication, certainly not less powerful 
the going antecedent probability 
of current abnormal-social research 
ower than one would like to see it, 
which is capable of improvement 
by increasing the size of the samples custom- 
arily employed.** The investigator on the 
track of a subtle issue in the area of subcep 


Therefore, 
succe 
is much 


Iftuation 


for increasing power: improving 
experimental design efficiency and/or experimental 
control, and renouncing a slavish adherence to a 
standard Type I level, usually .05. In some in- 
vestigations, an increase in the latter may result in 
so large an increase in power as to justify the greater 


12 Other means 
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tion who plans to study 30 cases would do 
well to take heed! 

The consequences of this state of affairs are 
fairly obvious. If many investigators are run- 
ning high risks of failing to detect substantial 
population effects, much research is resulting 
in spuriously “negative” results. One can 
only speculate on the number of potentially 
fruitful lines of investigation which have been 
abandoned because Type II errors were made, 
a situation which is substantially remediable 
by using double or triple the original sample 
size. A generation of researchers could be 
profitably employed in repeating interesting 
studies which originally used inadequate 
sample sizes. Unfortunately, the ones most 
needing such repetition are least likely to 
have appeared in print 

It is quite likely that similar conditions 
prevail in other areas of psychological re- 
search. It is recommended that psychological 
investigators attend to issues of power in their 
planning of experiments, and that the defini- 
tions of size of effect employed in this survey 
be used conventionally. In the absence of any 
basis for specifying an alternative to the null 
hypothesis for purposes of power analysis, 
the criterion values for a medium effect 
(Table 2) are offered as a convention 


SUMMARY 


The purpose of the study was to survey 
the articles of the Journal of Abnormal and 


Social Psychology, 1960, 61, from the point 
of view of the power of their statistical tests 
to reject their major null hypotheses, for 
defined levels of departure of population para- 
meters from null conditions, i.e., size of ef- 
fect. Conventional test conditions were em- 
ployed in power determination: nondirectional 
tests at the .05 level of significance. 

For this purpose, extensive tables for the 
common statistical tests were prepared from 
which the power of a test could be read as a 
function of sample size and size of effect. 
From these tables, the power to detect small, 
medium, and large effects of each statistical 
test employed in each article was determined, 


Type I risk. However, normal scientific conservatism 
would not tolerate too long a trip on this road 
Increased sample size is likely to prove the most 
effective general prescription for improving power. 
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and the mean power values for the major 
tests of each article were used to characterize 
that article. The distributions of these values 
were presented and summarized. 

It was found that the average power (prob- 
ability of rejecting false null hypotheses) 
over the 70 research studies was .18 for small 
effects, .48 for medium effects, and .83 for 
large effects. 

These values are deemed to be far 
small and suggest that much research in the 
abnormal-social area has lead to the failure 
to reject null hypotheses which are in fact 
false. This in turn may have lead to frequent 
premature abandonment of useful lines of 
investigation. 

Since power is a direct monotonic function 
of sample size, it is recommended that in- 
vestigators use larger sample sizes than they 
customarily do. It is further recommended 
that research plans be routinely subjected to 
power analysis, using as conventions the 
criteria of population effect size employed in 
this survey. 


too 
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THE INFLUENCE OF RESPONSE CONDITIONS ON RECOGNITION 
THRESHOLDS FOR TABU WORDS’ 


FRED H. NOTHMAN 


American University 


A perceptual defense hypothesis was first 
formulated by Bruner and Postman (1947), 
and Postman, Bruner, and McGinnies (1948), 
to explain the higher recognition thresholds 
that were found in the case of emotionally 
charged or negatively valued words. They sug- 
gested that a person “unconsciously” attempts 
to delay recognition of “‘anxiety laden” stimuli 
as long as possible. But they also recognized 
that this formulation raised the question: 
“How does the subject ‘know’ that a word 
should be avoided? In order to ‘repress’ he 
must first recognize it for what it is” (p. 153) 
McGinnies tried to resolve this dilemma 

In a study in which he compared reck 
nition thresholds for socially tabu and sociall) 
neutral words, McGinnies (1949) found that 
tabu words were more difficult to recognize 
than neutral words, and that GSRs prior to 
the recognition of tabu words were greater 
than prior to the recognition of neutral words 
McGinnies postulated the operation of some 
special perceptual mechanisms of an organ- 
ismic nature to account for the 
perceptual defense 

Howes and Solomon (1950; Solomon & 
Howes, 1951) suggested a more plausible and 
more parsimonious way of accounting for 
McGinnies’ (1949) results, by pointing out 
that the tabu words used by McGinnies were, 
on the basis of the Thorndike-Lorge (1944) 
semantic count, much less familiar words, and 
that McGinnies’ experimental situation of 
“scientific respectability” would tend to “set” 
subjects to inhibit overt verbal reports of 
tabu words, until they were “ more cer- 
tain before reporting it’ (Howes & Solomon, 
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process of 


1 This paper is based on a portion of a doctoral 
dissertation submitted to the Department of Psy 
chology, Indiana University. The writer wishes to 
express his sincere appreciation to H. G. Yamaguchi, 
Chairman, and J. A. Dinsmoor and L. H. Levy, 
committee members, for their encouragement and 
constructive comments during the course of this re 
search. Special thanks to J. Binder for the use of his 
tachistoscope. 


1950, p. 233). Thus perceptual defense, or 
whatever is left of it after equating for word 
familiarity, would be explainable in terms of 
selective reporting or response inhibition. 
Postman (Postman, Bronson, & Gropper 
1953) experimentally tested some of the Solo- 
He equated McGin- 
nies’ tabu words with neutral words of equal 
familiarity on the basis of the Thorndike- 
Lorge (1944) semantic count. Postman’s data 
show absence of the perceptual defense phe 
nomenon. However, he had introduced some 
additional changes from the McGinnies’ pro 
cedure. He gave advance information as to 
the kinds of stimuli to be shown tachisto 
scopically. Several other investigators (Cowen 
& Beier, 1950: Cowen & Obrist, 1958: Free 
man, 1954; Lacy, Lewinger, & Adamson 
1953) have discovered that advance informa 
tion regarding tabu word stimuli will signifi 
cantly reduce or eliminate the perceptual de 
fense phenomenon. Postman also changed the 
oral mode of responding used in McGinnies’ 
study by obtaining written from 
groups of subjects. Thus his work does not 
constitute a test of the McGinnies’ position 
The present experiment is similar in most 
respects to the earlier work by McGinnies 
(1949) and Postman et al. (1953), but in 
this study other modes of responding are used 
in addition to the oral one employed by Mc- 
Ginnies and the written one employed by 
Postman. Under these treatment conditions 
and on the basis of the previous considera- 
tions the following predictions are made: If 
as McGinnies believes, the process of percep- 
tual defense is mediated by modification ot 
visual perception in response to inimical or 
tabu stimuli, then the institution of different 
overt behavioral response conditions should 
not be expected to bring about any changes 
in the perceptual defense phenomenon, since 
distortion occurs at the perceptual level, and 
not at the level of overt responding. If, on 
the other hand, the process of “perceptual 


mon-Howes’ assertions. 


responses 
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defense” is mediated through response inhibi- 
tion, then variations in conditions 
may be expected to affect this response inhi- 
bition. 

In this experiment, the method | 
the subject identifies the word consists 
(a) oral responding, that is, pronouncing the 
word in full; (4) writing the 


(c) spelling parts of the word orally; 


response 


yy which 
of 
whole word: 
and (d) 
spelling parts of the word in writing 

It appears that not only response condi 
tions but also the sex of subjects may be re 
lated to differences between neutral and tabu 


thresholds. Some _ investi- 


word recognition 


gators report that women show significantly 


greater differences between neutral and tabu 
word recognition thresholds than men (Cowen 
& Obrist, 1 Freeman, Postman 
et al., 1953). On the other hand, McGinnies 
(1949) and Cowen and Beier (1954) 
such differences. In view of these conflicting 
it seemed important to allow an 
in the present 
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spelling the word components orally; Group IV, 
spelling the word components in writing. 

Each group consisted of 21 subjects, 15 males and 
6 females. The subjects were randomly assigned to 
the four response groups with the restriction that 
each group gain one subject per multiple of four 
subjects of each sex assigned 

Thresholds were obtained first on four of the five 
training words. The training words were followed by 
eight test words, four tabu and four neutral. Finally 
every subject was presented with one more neutral 
word. 

The method of selection of sequences of training 
words was as follows: 15 different training word se- 
quences were randomly chosen. Each sequence was 
used once in each response group of 15 males. Six of 
the 15 word sequences were each used once in each 
response group of females 

The method of selection of neutral and tabu words 
for presentation to a subject was as follows: of the 
12 test words used in this experiment, each subject 
was given a randomly selected sample of 8, 4 neutral 
and 4 tabu words, with the restrictions that (@) no 
subject receive the same neutral or tabu word twice; 
(b) for each subject, the neutral words in his par- 
ticular sample be those matched in frequency value 
to the tabu words selected; and (c) for the first 
three male subjects and every subsequent three male 
subjects assigned to a treatment group, each test 
word be used twice. The same restriction is applied 
to the females in each treatment group. Thus in a 
group of 21 subjects (15 males, 6 females), each test 
word was used 14 times (10 times for males, 4 times 
for females). 

Following the selection of test words, the order of 
presentation was randomly determined. This order 
was different for each one of the 15 male subjects in 
a group. The same 15 sequences were used for all 
groups, with the restriction that not more than two 
neutral or two tabu words follow each other. Six of 
the 15 sequences selected for males were used for fe 
males. The same 6 sequences were repeated in each 
treatment group of females. 

After presentation of the training words, the test 
words were first shown 60 milliseconds below the 
lowest threshold obtained on the training words. 

Intertrial intervals were approximately 10-15 sec- 
onds. 

For the determination of recognition thresholds, a 
modified method of limits was used by employing an 
ascending order of presentation only. Step intervals 
were 10 milliseconds. Every stisnulus word was pre- 
sented until the recognition threshold was reached 
For each additional presentation of a stimulus word, 
the exposure time was increased by 10 milliseconds 

The criterion of threshold was one correct spoken, 
written, or spelled identification of the word 

The use of cards was necessitated by the response 
requirements of this experiment; that is, writing and 
spelling. In the course of establishing recognition 
thresholds for the training and test words, these 
cards were used in all groups under the following 
conditions 
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Group I Whole Word (Oral 
Whole). Following each presentation of the stimulus 
word, the subject received a 3 inch X 5 inch card 
(described under Materials) instructed to 
look at the card while pronouncing the word he saw 
or thought he saw previously exhibited in the ta- 
chistoscope. Subjects in Group I were given cards 
for the methodological purpose of equalizing the con- 
ditions for the four treatment groups with respect to 
the handling of cards and the different eye adjust- 
ments necessary for looking through the tachistoscope 
and looking at the cards 

Group Il—Writing the Whole Word (Written 
Whole). Following each presentation of the siimulus 
word, the subject received a card and was instructed 
to look down at the card and to write on the card 
the word he saw or thought he saw previously ex- 
hibited in the tachistoscope 

Group I1l—Spelling the Word Components Orally 
(Oral-Com ponents). Following each presentation of 
the stimulus word, the subject received a card and 
was instructed to look down at the card while pro 
nouncing the letters of the word he saw or thought 
he saw previously exhibited in the tachistoscope. The 
cards the subject received had the numbers 1 through 
5 written on them; these numbers corresponded to 
the five letters of the word presented. Any one card 
had, on the basis of random determination, either 
two or three letters on it. The cards were made up 
in pairs, so that if a subject were first presented with 
a two-number unit, this was followed by a card 
bearing the three remaining numbers. Thus all the 
letters of the word were called for over every pair 
of odd-even trials 

Group I1V—Spelling the Word Components in 
Writing (Written-Components). Following each pres- 
entation of the stimulus word, the subject received a 
card and was instructed to look down at the card 
and to write on the card the letters of the word he 
saw or thought he saw previously exhibited in the 
tachistoscope. The cards the subject received had the 
numbers 1 through 5 written on them; these num- 
bers corresponded to the five letters of the word 
presented. Any one card had, on the basis of ran- 
dom determination, two or three letters on it. The 
cards were made up in pairs, so that if a subject 
were first presented with a two-number unit, this 
was followed by a card bearing the three remaining 
numbers. Thus all the letters of the word were called 
for over every pair of odd-even trials 

The cards were delivered through the chute in 
front of the tachistoscope following presentation of 
a stimulus word. After responding the subjects re- 
turned the cards to the experimenter via the chute at 
the side of the tachistoscope 


Pronouncing the 


and was 


RESULTS 


The primary data of the present experiment 
consist of the four thresholds for tabu words 
and four thresholds for neutral words for each 
subject. The group means, in milliseconds, of 
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rABLE 2 


MEAN THRESHOLDS 
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Male 
Female 
Combined 


the thresholds for both words for 
both sexes separately and pooled are pre- 
sented in Table 2. 

The first question asked of these data was 
whether or not there was a significant differ- 
ence between the groups with respect to neu- 
tral word thresholds. This question is of some 
theoretical importance, because if there were 
a significant difference between the groups 
for both tabu end neutral word thresholds, 
any resultant differences in the perceptual de- 
fense measure could not be ascribed exclu- 
sively to either response inhibition or percep- 
tual alteration. 

Accordingly an analysis of variance was 
made of the neutral words thresholds for the 
A Bartlett’s 
homo- 


types of 


four groups with sexes pooled 
test indicated a significant lack of 
geneity of variance, the chi square of 10.744 
for 3 df being significant beyond the 5% 
level. A Kruskal-Wallis nonparametric one- 
way analysis of variance (Siegel, 1956) was 
made and resulted in an H of 3.843. This 
value of H is not significant at the 5°% level 
(7.82 for 3 df). 

The data used in the statistical evaluation 


IN MILLISECONDS FOR NEUTRAL 
GROUPS 


AND Taspu WoRDs BY 


AND SEXES 


II Group I Group IV 


Tabu Neutral Tabu 


138.2 
109.6 
130.0 


152.8 
113.3 
141.6 


104.0 
148.8 
116.7 


to follow are mean differences in milliseconds 
between tabu and neutral word thresholds. 
These differences were obtained by averaging 
the four thresholds for tabu words and the 
four thresholds for neutral words for each 
subject, and taking the differences between 
the two sverages. When the average tabu 
word threshold was smaller than the average 
neutral word threshold, a minus sign was as- 
signed to the difference. 

The mean differences between neutral and 
tabu word recognition thresholds and the 
standard deviations of the differences for the 
various groups are presented in Table 3. The 
means are smaller for Groups II, III, and IV 
than for Group I in the case of both sexes; 
this indicates that the perceptual defense 
phenomenon measured by neutral-tabu word 
threshold differences decreased when pro- 
cedures other than the customary oral re- 


sponse were employed. It is to be noted also 
in Table 3 that the average threshold differ- 
ences of females are higher in all treatment 
groups than those of the males. 

Prior to testing for statistical significance, 
the data were examined to determine whether 


rABLE 3 


DEVIATIONS OF 
BETWEEN NEUTRAI 


MEANS AND STANDARD 


26.7 
21.4 
29.0 


AVERAGE 


THRESHOLD DIFFERENCES IN MILLISECONDS 
anD Taspu Worps 


Total 


ponents 
Mf SD 
11.3 


24.8 
15.2 


26.0 
24.8 
26.4 


6.7 
11.3 


8.0 


r than tabu word thresholds; otherwise, tabu 
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rABLE 4 


CORRELATION BETWEEN AVERAGI 
AND AVERAGE THRESHO 


DI FFERENCI 


Treatment groups 


Group I 
(Oral-Whole 
Group II 
(Written-Whole 
Group III 
(Oral-Components 
Group IV 
(Written-Components 


1.108 
0.656 
0.695 


0.800 


average differences between neutral and tabu 
word recognition thresholds were independent 
of the height of average thresholds, and also 
whether homogeneity of variance was within 
acceptable limits. 

Accordingly, product-moment correlation co- 
efficients were computed for each treatment 
group by correlating average threshold differ- 
ence between tabu and neutral words with 
average tabu and neutral word thresholds 
These correlation coefficients are presented in 
Table 4. None of these correlation coefficients 
were significant at the 5% level, indicating 
that the mean difference score is independent 
of the magnitude of the average threshold. 
Next, homogeneity of variance was tested 

Bartlett’s test. For the four treatment 
yups with sexes pooled, a chi square of 
956 was obtained, which is not significant 
at the 5% level (7.815 for 3 df). Another 
homogeneity of variance test was made after 
dividing each treatment group by sex of sub- 


I 


vy 
re 


gv 
~ 
). 


ject rhe 
was I( 


resulting chi square based on eight 
groups 343, which again is 
nificant at the 5 level (14.067 for 7 df) 

Next, a factorial analysis of variance was 
made by using a triple 2 X 2 X 2 classifica- 
tion (Edwards, 1950). The summary of this 
analysis is exhibited in Table 5. Each of the 
three variables had a effect on 
recognition thresholds. Specifically, significant 
differences are better than the 
001 level between the oral (Groups I and 
III) and written (Groups II and IV) modes 
of response; and significant differences at bet- 
ter than the .025 level were obtained between 
the whole (Groups I and II) and components 
(Groups III and IV) modes of 
Males and females differ significantly at bet- 
ter than the .025 respect to mean 
threshold differences 


not sig 


significant 


obtained at 


response 
level in 


Testing for interaction effects between the 
experimental variables indicates that a sig 
nificant interaction exists at better than the 
005 level between and 
components conditions (A X B). The 
actions between sexes and modes of responss 
(A x C, B < C) are not significant, indicat 
ing that the differential responding of the 
sexes to tabu and neutral words was not de- 
pendent on the different modes of response 


whole 


inter 


oral-written 


The triple interaction between sexes, writ- 
ten-oral, and whole-components conditions is 
likewise not significant. This indicates that 
changes in any one of these variables are in 
dependent of changes in the other two. 

In order to determine whether the percep- 
tual defense phenomenon, as measured by the 


TABLE 5 


DIFFERENCES BETWEEN TA 
[HRESHOLDS 


AVERAGE 
RECOGNITION 


ANALYSIS OF VARIANCE OI 


Source SS if WS 

6300.670 
3188.170 
3124.286 


6300.670 
3188.170 
3124.286 
1914.360 4914360 
450.267 450.267 
1728.601 728.601 
274.306 274.306 
38,548.730 7 507.220 
58,529.390 


Oral versus Written (A 
Whole versus Components (B 
Male versus Female subjects (( 
AXB 
AXC 
BXC 
AXBXC 
Within groups 
Potal 


* Significant at better than the .025 level 
** Significant at better than the .005 level 


*** Significant at better t the .001 | 
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rABLE 6 


OF THI 
NEUTRAI 


0.629 
2.009 
0.941 


Group I (Oral-Whole 

Group IT (Written-Whole 
Group III (Oral-Components 
Group IV (Written-Components 


Aan 


mean difference between tabu and neutral 
word thresholds, was present or absent in the 
mean difference 
a ¢ test against 
Three t’s were 
group, one for 
combined 


various treatment groups, the 
for each group was tested by 
a theoretical mean of zero. 
computed in each treatment 
each sex and one for the 

The results of these ¢ tests are presented in 
Table 6. 

Males show significant 
between tabu and neutral word thresholds at 
better than the .001 level only in Group I 
(oral-whole). Males in the other three groups 
did not attain a significant ¢. But one should 
note that Group III (oral-components) just 
fails to reach significance at the .05 level. 
mean differences 


sexes 


mean differences 


Females show significant 
between tabu and neutral word thresholds at 
better than the .02 level in Group I (oral- 
whole), and at better than the .05 level in 
Group II (written-whole) 

The combined t’s for males and females of 
different treatment indicate that 
Group I shows significant mean threshold 
differences between tabu and neutral word 
thresholds at better than the .001 level, and 
Group III at better than the .02 level. Sig- 
nificant differences obtained be- 
tween tabu and neutral word thresholds for 
Groups IT and IV. 


the groups 


were not 


DISCUSSION 


The results support the response inhibition 
hypothesis rather than a perceptual defense 
hypothesis, since they show that the use of 
different response conditions had the effect of 
significantly lowering threshold differ- 
ences between tabu and neutral words with- 


mean 


DIFFEREN 


Worpb 


538°" 


ES BETWEEN THE AVERAGE TABU AND 


[THRESHOLDS 


3.651** 


3.239" 


out producing significant differences in neu- 
tral word thresholds. The important and de- 
termining role that response conditions play 
in the evocation of the perceptual defense 
phenomenon is pointed up by these findings. 
A concept of perceptual defense becomes a 
misnomer in that the differential height of 
tabu to neutral word thresholds is specific 
to the response conditions employed, and 
that 
torily interpreted in terms of response inhi- 
bition. 

A major finding of this study is that Post- 
man’s and McGinnies’ results are not irrecon- 
cilable. When the condition 
sisted of words reported orally and in full, as 
in Group I, the results follow closely McGin- 
nies’ (1949) findings. But when the response 
conditions were made like those that 
prevailed in Postman’s (Postman et al., 1953) 
experiment, by instituting written response 
conditions, as in Groups II and IV, the re- 
sults follow more closely those reported by 
Postman; namely, the perceptual defense 
phenomenon disappeared. The data at hand 
show, therefore, that substitution of different 
response the seemingly 
contradictory results of McGinnies and Post- 
man. Furthermore, the principle of response 
inhibition can cover both situations parsi- 
moniously. 

With respect to the sex variable, no dif- 
ferential effect was found when different re- 


these mean differences can be satisfac- 


response con- 


more 


conditions resolve 


sponse conditions were used; the analysis of 


variance indicated that the interaction be- 
tween sexes and modes of response is not sig- 
nificant. However, the sex variable, with re- 


sponse conditions pooled, does have a signifi 
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cant effect on recognition thresholds in that 
females show significantly higher mean dif- 
ferences between neutral and tabu words than 
males. 

These results in respect to the sex variable 
are in agreement with the findings of other in- 
vestigators (Cowen & Obrist, 1958; Freeman, 
1955; Postman et al., 1953). It is uncertain 
what may cause these sex differences. It has 
been suggested that the sex of the experi- 
menter (male) in an experiment where fe- 
male subjects are used may be a variable, but 
Postman, who investigated the experimenter 
variable by using both male and female ex- 
perimenters, did not find any significant in- 
teraction between sex of experimenter and 
sex of subject. Postman has suggested that 
females may be less familiar with tabu words, 
and less ready to report them; that is, they 
require a higher degree of certainty. Even so, 
the underlying causes for the behavior noted 
still remain to be specified. 

In the light of the results of this experi- 
ment, response inhibition may be viewed as a 
complex function of oral versus written and 
whole versus components conditions, and the 
interaction between them. It would seem that 
the optimal conditions for the evocation of 
response inhibition, in a perceptual defense 
situation, are oral responding and whole word 
responding; and the minimal conditions for 
the evocation of response inhibition are writ- 
ten responding and spelling components of 
words. 

An explanation for these findings may be 
sought in the cultural conditioning of the sub- 
jects. It is conceivable that in the course of 
his reactional biography, the individual in our 
culture learns that the consequences of speak- 
ing tabu words are much more severe than 
the consequences of writing or spelling or ex- 
pressing tabu words in some other indirect 
fashion. 

SUMMARY 

It was hypothesized that the perceptual de- 
fense phenomenon can be accounted for on 
the basis of response inhibition. Therefore, 
it was predicted that under response condi- 
tions other than the customary pronouncing 
of neutral and tabu words in full, the differ- 
ences between neutral and tabu word thresh- 
olds would significantly diminish. 
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The effects of four modes of response on 
recognition thresholds were investigated by 
using four groups consisting of 21 subjects 
each (15 males, 6 females). The modes of 
response were (a) the customary oral re- 
sponding, that is, pronouncing in full the ta- 
chistoscopically presented word; (0) writing 
the whole word; (c) spelling the word com- 
ponents orally; and (d) spelling the word 
components in writing. 

The test stimuli consisted of six tabu words 
and six neutral words (equated in famili- 
arity), and were exhibited tachistoscopically 
Recognition thresholds for the words pre- 
sented were determined by a modified method 
of limits. 

The findings are: 

1. Oral responding resulted in significantly 
greater mean differences between neutral and 
tabu word thresholds than written responding 

2. Whole word responding resulted in sig- 
nificantly greater mean differences between 
neutral and tabu word thresholds than re- 
sponding by spelling components of words 

3. Females showed significantly 
mean differences between neutral and 
word thresholds than males. 

4. The interaction effects between oral vet 
sus written and between whole versus com 
ponents were significant. 


greater 
tabu 


5. The interaction between the sexes and 
modes of response was not significant 

The results are interpreted as lending sup- 
port to the response inhibition hypothesis 
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SOCIAL DESIRABILITY OF SELF-REFERENCE 
STATEMENTS AS A FUNCTION OF FREE 


ASSOCIATION 


ROBERT F 


PATTERNS 


TERWILLIGER 


Douglass C 


In considering a verbal statement as a 
stimulus, it is apparent that there are many 
dimensions upon which one can scale or cate- 
gorize such stimuli. This paper will be con- 
cerned with the social desirability of the idea 
or characteristic expressed by the statement 
(Anastasi, 1961, Ch. 18; Edwards, 1957). 
Social desirability will be defined as being a 
particular type of affect. Such a definition is 
justifiable since such social desirability judg- 
ments appear to have an affective tone, that 
is, statements are scaled with respect to their 
social acceptability (goodness) or social un- 
acceptability (badness). This affective tone 
may not be perfectly related to the subject’s 
own feelings about the statement. Rather, he 
may react to some abstract notion of how the 
average man feels. Nevertheless, the judg- 
ments are made with respect to an affective 
characteristic of the statement. 

There are certain known variables which 
appear to determine the affect of verbal 
stimuli. Johnson, Thomson, and Frincke 
(1960) have shown that the pleasantness of 
a nonsense syllable will be greater as it be- 
comes more familiar, where a familiar syl- 
lable is one which has a greater frequency of 
occurrence. Frequency of occurrence is a 
variable which enters into most, if not all, of 
the so-called associationistic theories. This, 
then, suggests the possibility that there may 
be other “associationistic” variables which re- 
late to the variable of affect and to the par- 
ticular type of affect known as social desir- 
ability. 

A study has been reported in which it was 
shown that Semantic Differential responses to 
verbal stimuli were in part related to the dis- 
tributions of free associations to the words in 
question (Terwilliger, 1962). The distribu- 
tions of free associations were described by 
means of the information statistic H. This 
statistic describes both the number of asso- 


ciations and the probability of each, H in- 
creasing as the number of associations in- 
creases and as their probabilities 

more equal. For present purposes the 
tant result from that study was this: 


become 
impor 
words 


with high H values were given more polar’ 
Differential 
were words with low H values. The results of 


responses on the Semantic than 
the factor analyses of the Semantic Differen- 
tial reported by Osgood, Suci, and Tannen- 
baum (1957) show that “evaluation” (affect) 
was loaded highest of all the semantic fac- 
tors. This suggests that most of the polarity 
reported in Terwilliger’s study was affective 
polarity. This would mean that a word which 
has a large number of free associations and 
or equiprobable associations would have high 
affect—without specifying whether this affect 
was positive or negative. Therefore, one could 
conclude that free associations, 
by the H statistic, relate to affect, but no hy- 
potheses yet exist which allow one to predict 
the type of affect 
Before proceeding to propose such hypothe- 
ses, two psychological variables must be pre- 
sented and defined. It is assumed that verbal 
stimuli have many dimensional attributes 
which are psychologically relevant in that 
these dimensions can be used as the basis for 
discriminations or choices. Of dimen- 
sions, two shall be considered. These will be 
called ambiguity and balance. In conventional 
terms, the ambiguous word has more different 
meanings. One verbal stimulus will be said to 
be more ambiguous than another when that 
stimulus elicits or tends to elicit more dif- 
ferent than the other. When the 


as de scribed 


positive or negative. 


these 


responses 


1 Polarity refers to the absolute rating of a word 
on a Semantic Differential scale. A highly polar word 
is given a rating farther from the neutral point on a 
scale than is a less pe lar word. Polarity can be con 
ceived of as reflecting the amount of meaning of a 
word in so far as it indicates how far that word lies 


from the neutral or “no meaning” point 
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meanings of a word are equal in strength, it 
will be said to be perfectly balanced. A ver- 
bal stimulus will be said to be better balanced 
than another if the responses elicited by it 
are more equal in probability of occurrence 
than in the other. The perfectly balanced 
word elicits responses all of which have an 
equal probability of occurrence. It should be 
apparent that the above definitions can be 
applied to any sort of verbal stimulus. They 
are equally applicable to single words, sen- 
tences, or even complete literary works. From 
this time on, the discussion will be restricted 
to single sentences, although the theory ex- 
younded should be applicable to other types 
yf verbal stimuli as well 

It is assumed that all verbal stimuli may be 
laced on the dimensions of ambiguity and 
balance and also on the dimension of affect. 
It is proposed that the following relation- 
ships will hold between these variables. Con- 
sidering verbal stimuli which increase in am- 
biguity, one should find that they also de- 
crease in positive affect; that is, given a pair 
of statements, the ambiguous will be 
judged the more pleasant, and the greater 


less 


the difference in ambiguity, the more often © 


such judgments will occur when a large num- 
ber of such judgments are made. Considering 
verbal stimuli which increase in balance, one 
should find that they also increase in positive 
affect. Given a choice two 
stimuli, the better balanced 
as more pleasant, and the greater 
in balance, the more often 
a large number 


between verbal 


one will be 
chosen 
such differences 
such choices will occur when 
of such judgments are made 

Three psychological variables (affect, am- 
biguity, and balance) have now been postu- 
lated and two relationships between them 
hypothesized. In order to substantiate these 
hypothesized relationships, which is the pur- 
pose of this study, the variables must be 
related to empirical manipulations. It is as- 
sumed that the affect of a verbal statement 
a subject to 
social 


can be assessed by instructing 
typical pair-comparison 
scaling, requiring him to check 
statement in each of 
several pairs of statements. If such measure- 
ments are made over a large number of 
subjects, it will be assumed that the propor- 


perform a 
desirability 


the more desirable 


163 


tion of times a statement was checked is an 
increasing function of the desirability or affect 
of the statement. The affect of an item with 
respect to another specific item will be re- 
flected by the proportion of times that item 
was selected over the other. 

In order to measure ambiguity and balance, 
the nature of a particular statistic, that for 
information, must be considered. The H or 
information statistic is defined as follows: 


H = — 2p loge p 


A brief consideration of this statistic shows 
that there are two variables operating to 
determine H: p or the probability of each 
response and N or the number of different 
responses, which enters in as the summation 
sign. What enters into the information equa- 
tion as an undefined probability of response 
will be, for purposes of this study, the prob- 
ability of giving a particular free association. 
As normative data indicate (Kent & Rosanoff, 
1910), a given verbal stimulus will elicit 
many different associations, and these as- 
sociations will differ in their frequency of 
occurrence. Thus for any given verbal stim- 
ulus there will be NV different free associations, 
each with a probability p. By entering the 
probability of each association into the above 
equation and summing over all associations 
to the given stimulus, the distribution of free 
associations to a given verbal stimulus can be 
described by means of the # statistic, as has 
been done in previous studies (Ross & Levy, 
1960; Terwilliger, 1962). 

The H value of a verbal stimulus 
increase under two conditions: as the number 
of different associations (VV) increase and as 
the probabilities (p) of these associations 
become more equal.* It is assumed that NV 
will be an increasing function of the am- 
biguity of the stimulus since N indicates the 
number of responses to the stimulus and 
ambiguity increases as the number of re- 
sponses increase. The equality of p is an 


will 


2A stimulus which calls out three different re- 
sponses will have a higher H value than one which 
calls out only two responses. Or, one could be given 
two stimuli, each of which called out two responses 
If these responses had probabilities of occurrence of 
75 and .25 for the first stimulus and .50 and .50 for 
the second stimulus, the latter would have the higher 
H value. 
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increasing function of the balance of the 
stimulus since a balance stimulus has re- 
sponses of equal probability. It follows that 
an increase in H measures an increase in 
ambiguity and/or an increase in balance.* 
N and the equality of » may be varied in- 
dependently of each other by (ca) holding 
H constant and varying N and (0) holding 
N constant and varying H. A variation of 
H with N constant can only come about if the 
relative sizes of the values with respect to 
each other change, and variation of N with 
H constant assures that N changes while the 
relative sizes in the various p values remain 
the same. Both H and N can be determined 
empirically for any given verbal stimulus by 
determining the number of free associations 
to the stimulus and the probability of each 
and entering these into the equation for the 
H statistic. The effects of » equality and NV 
on affect can be determined by either statis- 
tically or empirically holding H constant and 
varying NV, and holding N constant and vary- 
ing H with respect to affect. It will be via 
these means that the hypotheses proposed 
above shall be tested. 

The concepts of ambiguity and balance are 
individual variables, that is, any given person 
should have different amounts or degrees of 
these variables for any given verbal stimulus 
To measure ambiguity and balance for a 
given individual, it would be necessary to 
have him give many free associations to any 
given stimulus and rate these associations in 
such a way that their probability of occur- 
rence could be estimated. It will be assumed 
that as a cultural tool which serves the pur- 
pose of communication, linguistic units will 
have very similar “meanings” to all members 
of that culture. Then it is assumed that the 
ambiguity and balance of a verbal stimulus 
can be assessed across individuals rather than 
within a single individual. By having a large 
number of individuals give only one free as- 
sociation to a given stimulus, it will be pos- 
sible to secure an estimate of the population 
value of the number of different free associa- 


3A previous study (Terwilliger, 1962) treated H 
as a measure of ambiguity. This study will indicate 
that H measures both ambiguity and balance and 
that it is possible to demonstrate the independence 
of these two. 
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tions and the probability of each. From these, 
the ambiguity and balance of these stimuli 
for all typical members of the culture can be 
calculated. It will be this “normal” method 
which will be used in this study 

Let it be assumed that the H (balance) 
and N (ambiguity) values of a number of 
statements have been assessed. The empirical 
hypotheses to be tested are as follows: 


Hypothesis 1. With N (ambiguity) held 
constant, there will be a positive correlation 
between H (balance) and the average prob- 
ability of selecting a given statement as the 
more socially desirable in a pair-comparison 
situation. 

Hypothesis 2. With H (balance) held con- 
stant, there will be a negative correlation 
between V (ambiguity) and the average prob- 
ability of selecting a given statement as the 
more socially desirable in a pair-comparison 
situation. 

Hypothesis 3. Considering pairs of state- 
ments, with differences in V (ambiguity) held 
constant, there will be a positive correlation 
between differences in H (balance) between 
two statements (running from minus when an 
item has less H—balance—than another to 
positive when it has more) and the proportion 
of times that item will be chosen as more 
socially desirable in that pair. 

Hypothesis 4. Considering pairs of state- 
ments, with differences in H (balance) held 
constant, there will be a negative correlation 
between differences in N (ambiguity) between 
two statements and the proportion of times 
that item will be chosen as more socially 
desirable in that pair. 

Hypothesis 5. Whether an item is selected 
as more socially desirable than another is 
more a function of the other alternatives 
available than of its absolute H (balance) 
and N (ambiguity) values. Therefore, the 
correlations predicted in Hypotheses 3 and 
4 should be higher than those in 1 and 2 

Hypothesis 6. Since the information (7) 
equation contains N (ambiguity) in it, there 
should be a high positive correlation be- 
tween H (balance) and N (ambiguity). 

Hypothesis 7. Since H (balance) and N 
(ambiguity) are not empirically independent 
of each other and since H (balance) is a 
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function of both NV (ambiguity) and equality 
of ~, which act in opposite directions with 
respect to affect, it is assumed that the cor- 
relations predicted in Hypotheses 1 through 4 
will be near zero when the other variable is 
not held constant. 

Hypothesis 8. Since both H (balance) and 
N (ambiguity) are related to affect, it fol- 
lows that there should be a high multiple 
correlation coefficient between H (balance), 
N (ambiguity), and social desirability. This 
should also hold for differences in H (balance) 


and differences in N (ambiguity) with the 


social desirability determined for each pair 
separately, as outlined in Hypotheses 3 and 4. 


METHOD 


All questions on the Bernreuter (1935) Person 
ality Inventory were reworded such that they be- 
come self-reference statements starting with the 
word “I” rather than questions. Since it seemed 
desirable to hold the number of words in the state- 
ment constant and since statements seven words in 
length were most common, only these seven-word 
statements were used (see Table 1). The state- 
ments were presented in a pair-comparison test, 
each item being paired with every other item in 
random order. There were two forms of this test 
These forms differed only in the order in which 
each statement was presented within each pair. 
The subjects were instructed to disregard whether 
or not the statement was an accurate description of 
themselves but simply to check that statement in 
each pair of statements which was, in their opinion, 
the more socially desirable statement 
For each of the statements, the H 
number of different associations were 
means of the sentence guessing technique 
technique the subjects presented 
first word of the sentence—in this case it was al- 
ways “I”—and asked to guess the second, then 
presented with the first two words and asked to 
guess the third, etc., until each subject had made 
one guess for each of the six missing words for each 
of the 10 statements.4 The guesses were made in 
order, guessing the second word with one preceding 
word as stimulus, then the third with two 
preceding words as stimuli, etc. Subjects were re- 
quired to guess the second word in all 10 statements 


value and 
assessed by 
In this 


were with the 


word 


noted that the correctness or in 
correctness of the subjects’ guesses were not taken 
into’ account here. The responses made by the sub- 
jects had no bearing on subsequent stimuli presented 
Taking the statement “I prefer a play to a dance” 
as an example, the subject would first guess the word 
which would follow “I.” He would then guess the 
whether or not he hap 


#It should be 


word to follow “I prefer” 
pened to guess that “prefer” was the second word 
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before being allowed to guess the third word in any 
statement. This held for subsequent words as well. 
Each sentence was guessed at with a given number 
of preceding words as stimuli before another preced- 
ing word was added to the stimuli. For any par- 
ticular series of guesses with a given number of 
preceding words as stimuli, the order of presenting 
each sentence was randomized. The materials for 
this were prepared on a standard dittoed form. All 
the stimuli on a given page, to which the subject 
was to guess the next word, had the same number 
of words. The next page would have that number 
plus one, etc. In those in which the same 
sequence of words appeared in more than one 
sentence, it was presented as a stimulus only once. 

These tests were administered in the guise of a 
classroom exercise to 100 female undergraduates 
The sentence guessing was presented first, followed 
directly by the Social Desirability scaling. For each 
of the six guessed words on each of the 10 state- 
ments, the number of different guesses (N) and the 
probability of each was determined. Using these 
values, an H value was determined for each position 
in each sentence. The H and N values for the 
sentences as a whole were determined by summing 
the six values obtained for the parts of that sentence 
These totals are in Table 1. The dif- 
ference values for a pair of statements were ob- 
tained by subtracting their respective H and N 
values. The reliabilities of the H and N values were 
estimated by the following method: The subject 
sample was divided in half randomly and H and N 
values for the sentences were determined separately 
for each separate sample of 50 subjects. The ob- 
tained values for each sample were correlated to 
determine the reliability of the values. In a similar 
reliability of the difference measures 
was determined. As indicated in Table 1, these 
coefficients were all above .93, which may be con- 
sidered highly reliable. It should be noted that the 
values used in the subsequent hypothesis testing 
were combined values based on the total sample of 
100 subjects. Since these values are based on a 
larger sample, their reliability is probably even 
higher than that indicated by the correlations 
presented. 

The results of the pair-comparison social desir- 
ability ratings were recorded in a 10 X 10 matrix 
Since no statement was paired with itself, the 
diagonals of this matrix were estimated, the value 
of .50 being used. The figure entered in each of 
the 100 cells was the proportion of times the 
sentence on the abscissa had been chosen when paired 
with the sentence on the ordinate. By summing the 
columns and dividing by 10, the average propor- 
tion of times an item was chosen as more socially 
desirable was obtained. The reliability was obtained 
by making two such matrices and obtaining two 
such averages from two independent and random 
samples of 50 subjects. As indicated in Table 1, 
the reliability was .996. Values based on a sample 
of 100 subjects were used in further analysis. 

For testing the hypotheses about differences be- 
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presented 


manner the 
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rABLE 1 


rE VALUES Ft 


I can stand criticism without feeling hurt 
I consider myself a rather nervous person 
I am very talkative at social gatherings 
I worry too long over humiliating experien: 
I prefer a play to a dance. 
I usually prefer to work with others. 
I have sometimes had spells of dizziness 
I usually enjoy spending an evening alone 
I experience many pleasant or unpleasant m« 
I am troubled with feelings of inferiority 
Reliability 
Reliability of differences 


tween pairs of statements, 
selecting each item when paired with each other 
item were needed. There are 45 unique pairs of 
items. For purposes of this analysis, the 45 in the 
lower left-half of the above matrix were used. Their 
reliability was determined as before and is In 
addition, it should be mentioned that analyses of 
the pair-comparison data revealed that there 
no response or position bias in the selection of items 
An item would equally often whether 
it appeared first or second in the pair (Dy = .011 
CR: 


the probabilities of 


O85. 
was 


be selected 


Be) 


iz) 
RESULTS 


Table 2 presents the raw score correlations 
between the three variables using the average 
or absolute scores of each of the statements 
As predicted by Hypothesis 6, there is a high 
positive correlation between H and N which 
is significant at less than the .01 level for 
8 df. As predicted by Hypothesis 7, neither 
of the correlations between the two variables 
and social desirability is significant, although 
each is in the direction which should appear if 
the third variable were held constant. In 
order to study the relations between each pair 
of variables with the third held constant, the 
partial correlation technique was used. These 
partial correlations are presented in Table 2. 
It is apparent that there exist large partial 
correlations between the variables of H and 
N and social desirability, positive in the case 
of H (or equality of p) and negative in the 
case of N. Both of these are significant at 
less than the .05 level for 7 df. Therefore, 
Hypotheses 1 and 2 may be considered to be 
valid. As will be seen from the table, the 
multiple correlation between the two variables 
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be Go 


and social desirability is quite large, account- 
ing for nearly half the variance in social 
desirability. However, with only 7 df, it fails 
to reach statistical significance. Nevertheless, 
in view of its the significances of the 
separate coefficients, and the reliability of the 
instruments, it still to assume 
that the two variables acting together con- 
stitute an important of 
desirability. 


size, 


may be safe 


predictor social! 

Table 3 presents the correlations between 
the differences in H and N 
members of a pair of statements and the 


values between 


probability of selecting a statement as socially 
desirable when presented in that pair. The 
raw score correlations are remarkably similar 


to those for the absolute values presented 


TABLE 2 


BASED ON ARBSOI 
Each STATEMENT 


VY = 10 


CORRELATIONS 


Raw score 
H versus N 
H versus Social Desirabilit 
N versus Social Desirability 


Partial 
H versus N (Social Desirability cot 
H versus Social Desirability (NV cor 
N versus Social Desirability (/ 


Multiple 
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TABLE 3 


CORRELATIONS BASED ON DIFFERENCES BETWEEN 
ParrRs OF STATEMENTS 


(N = 45 


Raw score 
H versus N 
H versus Social Desirability 
N versus Social Desirability 


Partial 
H versus N 
H versus Sov 


N versus Sox 


Social Desirability constant 
ial Desirabilit V constant 
ial Desirability (/7 constant 
Multipk 

Beta H 

Beta 

R 


above. There is a high positive, statistically 
significant correlation between differences in 
H and differences in N as predicted by Hy- 
pothesis 6. The correlations between dif- 
ferences in H and social desirability and dif- 
ferences in VN and social desirability are both 
nonsignificant but in the appropriate direc- 
tions, as predicted by Hypothesis 7. As in- 
dicated in Table 3, there are large, statis- 
tically significant (less than .01, 42 df) partial 
correlations between H, N, and social desir- 
ability. H and social desirability are related 
positively with N constant, and N and social 
desirability are related negatively with H con- 
stant. The multiple correlation is also large 
and statistically significant (less than .01) and 
a consideration of the beta weights indicates 
that the two variables have approximately 
equal weight, although in opposite direction 
These data show that Hypotheses 3, 4, and 8 
are valid. In addition, the partial and multiple 
correlations for the difference data are con- 
higher than for the 
data, thus validating Hypothesis 5 


sistently absolute or 


average 


DISCUSSION 


It is apparent from the above results that 
a statement will be rated as more socially 
(has 
fewer associations) as long as its balance is 
held constant. Likewise, it 
more socially desirable as it becomes better 


desirable as it becomes less ambiguous 


will be rated as 
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balanced (has more equiprobable associa- 
tions) as long as its ambiguity is held con- 
stant. Given a pair of statements, that which 
is less ambiguous and/or better balanced will 
be chosen as the more socially desirable, 
and the greater these differences between the 
two statements are, the more likely this choice 
is to be made. It is not the absolute value of 
the statement which determines whether or 
not it is to be chosen as socially desirable, but 
the difference between the statement in ques- 
tion and the alternatives to it. 

So far, nothing explicit has been said about 
possible cause and effect relationships among 
variables being considered. One of the reasons 
for this conservatism respect for the 
truism that neither correlations nor any other 
statistic tell one anything about cause and 
effect. However, any statistic, while not prov- 
ing causality, may be interpreted causally 
providing a modicum of logic and common 
sense is used in the process. Such an attempt 
at causal interpretation will now be made. 

It seems likely that a verbal unit or other 
symbol acquires meaning, the power to evoke 
a response, the power to evoke affect, or any 
other possible attribute only by virtue of 
being associated with something else. (The 
term “association” is used in a totally un- 
theoretical sense. No particular learning 
theory is implied.) By itself the verbal unit 
has no powers. When it is said that a verbal 
stimulus has affect, or social desirability, it 
is generally meant that a certain emotional 
state or certain other verbal stimuli (e.g., 
“good’’) have been associated with this verbal 
stimulus. It may be quite true that either the 
ability to use verbal symbols and/or the 
ability undergo affective 
arousal are innate, but it must be assumed 
that the ability of a verbal unit to elicit or 
have the property of affect is acquired in 
some manner. Thus association in some form 
is primary; it precedes the affect of the 
verbal unit. In view of this, it seems plausible 
that there may be some particular laws or 
types of association which help to determine 
or cause certain particular types of affect. 

It is recognized by many psychological 
theorists that there are unconditioned or 
innate stimuli for certain specific or general 
emotional states. It is proposed in this paper 


was 


to experience or 





168 


that the pattern of associations to a given 
stimulus determines what emotional or af- 
fective state it will call out. Affect is not 
associated to a word but is called out by its 
associations. In so far as it has associations, 
any stimulus will produce some emotional or 
affective response, and the type and amount 
of this affect will be modified as the stimulus 
is modified through the acquisition of new 
associations. Two possible variables of as- 
sociation acquisition have been studied herein. 
It has been shown that as more different as- 
sociations exist, as the stimulus becomes more 
ambiguous, affect will become less positive. 
As the associations become more equal in 
likelihood of occurrence, as the stimulus be- 
comes better balanced, affect will become in- 
creasingly positive. It is assumed that it is 
these associational patterns which cause the 
affective state we call social desirability. 

Certainly this assumption can be defended 
more plausibly and logically than any as- 
sumption that verbal units with certain affec- 
tive characteristics acquired certain associa- 
tional patterns. And this, it would seem, is 
the only other possible explanation. However, 
if affect were primary, one would have to 
make the assumption that pleasant stimuli 
would be used more often. Being used more 
often, these pleasant stimuli should, just by 
chance if nothing else, acquire more different 
associations. Hence, they should become more 
ambiguous. It has been shown, however, that 
pleasant statements have fewer associations. 
Thus this hypothesis seems untenable. It 
must be concluded that the associations of a 
word determine its affective quality. 

In conclusion, some mention must be made 
about the sizes of the correlations obtained. 
By the standards of most psychological work 
they are sizable. Nevertheless, as the multiple 
correlation indicates, we have accounted for 
only 60-70% of the variance in social de- 
sirability. And this could, of course, be an 
overestimate of the true population coefficient. 
Despite the value of accounting for 60% 
of the variance, there remains at least 40% 
unaccounted for. It was not even hoped that 
these two variables would exhaust the vari- 
ance in social desirability. There are any 
number of other dimensions of verbal units 
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which could theoretically serve as discrimina- 
tive stimuli for eliciting affect. Two of these 
are familiarity and what might be called 
quality. 

Previous experiments have indicated that 
as the familiarity of a verbal stimulus in- 
creases, so too will its affect (Johnson et al., 
1960). Familiarity may be considered an- 
other associationistic variable representing the 
strength of the given associations, in -an 
absolute rather than in the relative 
sense of balance. Had it been possible to 
take this variable into account along with 
ambiguity and balance, it is assumed that the 
multiple correlation would have been in- 
creased. Quality refers to the particular, 
specific associations given to a particular 
stimulus. Those psychologists interested in 
projective testing (Kent & Rosanoff, 1910; 
Rotter & Rafferty, 1950) have long assumed 
that the particular free association given re- 
flects something about the affective state of 
the individual, not to mention other person- 
ality characteristics. The only problem with 
this variable has been, and remains, the dif- 
ficulty in quantifying such data in a way 
that a meaningful statistical and theoretical 
analysis can be made of them. The fact that 
such quantification has not been achieved, 
however, does not mean that the variable is 
unimportant. 

These considerations aside, this study has 
shown and attempted to justify the facts that 
the ambiguity (number of different associa- 
tions) and balance (equiprobability of as- 
sociations) of verbal statements are related 
to the social desirability of these statements, 
the former negatively and the latter posi- 
tively. It is assumed that the affective state is 
caused by these associational patterns. 


sense 


SUMMARY 


Two attributes of verbal stimuli, ambiguity 


and balance, were proposed. It was assumed 


that an increase in the number of free as- 
sociations to a sentence indicated an increase 
in its ambiguity. An increase in the equality 
of probability of occurrence of these as- 
sociations (described by the H statistic) was 
an increase in balance. 
an increase in the 


assumed to indicate 
It was hypothesized that 
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a stimulus would lead to an in- 
crease in its positive affect (operationally 
measured as social desirability) while an in- 
crease in its ambiguity would lead to a 
decrease in its positive affect. The data, ob- 
tained from sentence guessing and scaling of 
self-reference statements by 100 subjects, 
supported these hypotheses. A high positive 
correlation found between balance and 
social desirability (with ambiguity constant) 
and a high negative correlation between am- 
biguity and social desirability (with balance 
constant) was also found. The multiple cor- 
ambiguity, balance, and 
social desirability that about 60% 
of the variance in social desirability was ac- 
counted for by these two variables. An at- 
tempt was made to justify the assumption 
that ambiguity and balance 
If this were not the case it would 


balance of 


was 


relation between 


showed 


cause social 


desirability 
appear unlikely that these relationships would 


have been found 


hold. 


Rather the opposite should 
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STUDIES IN EFFICIENCY: 


GSR CONDITIONING 


AS A FUNCTION OF DEGREE 


OF TASK CENTERING’? 


JULIUS WISHNER 


University of Pennsylvania 


The present paper reports on an experi- 
mental test of a prediction that the rate of 
conditioning of the galvanic skin response 
(GSR) will be a function of the interaction 
between motivational state and task require- 
ments. The conditioning method used derives 
from the studies of Welch and Kubis (1947a, 
1947b). The theoretical formulation and the 
method of inducing motivational states stem 
from a conceptualization of psychological 
efficiency developed in relation to a proposal 
to define a dimension of intensity of psy- 
chopathology (Wishner, 1955). 

Welch and Kubis (1947a, 1947b) 
that patients with anxiety conditioned three 
times as rapidly as normals and extinguished 
about three times as slowly. The uncondi- 
tioned stimulus (UCS) in their experiments 
was a buzzer, and the conditioned stimulus 
(CS) was a nonsense syllable which had to be 
differentiated from a group of neutral non- 
sense syllables, i.e., those not paired with the 
buzzer. Analysis of the procedures and re- 
sults reported by Welch and Kubis indicate 
that, from the standpoint of the concept of 
efficiency, the relative standings of their 
groups in speed of conditioning might be 
reversed under certain conditions, necessitat- 
ing a reinterpretation of their results, as well 
is those of some others. 

In terms of the efficiency concept, a crucial 
factor is what the subject perceives to be 
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the task requirements. The theory proposes 
that, in general, individuals with greater psy- 
chopathology will be less able to meet the 
task requirements than individuals with less 
psychopathology. Close examination of the 
task requirements in the Welch and Kubis 
(1947a, 1947b) experiments revealed that the 
subjects had been instructed to relax, to keep 
extraneous matters from their minds, and 
to pay no attention to noises which might 
occur in the room. From the point of view 
of the ability to meet task requirements, 
therefore, it might be argued that the patients 
with anxiety were inferior to the normals, 
since the rapid conditioning of the 
former can taken of their 
failure to do precisely asked. The 
rapid conditioning of the patients may be 
evidence of a strong tendency toward un 
critical association of events on the basis of 
sheer contiguity. It was pointed out that this 
interpretation would have special attraction 


more 
evidence 


as 


what was 


be 


if one could produce the opposite results 
when the instructions were varied. Thus, it 
was predicted that if subjects were instructed 
to solve a problem based on the contiguity 
of two events, the 
(Wishner, 1955). 

Specifically, in the present experiment, it 
was predicted that there would be a signifi- 
cant interaction between type of subject and 
type of task requirement: the more anxious 
or more diffuse subject showing more con 


results would be reversed 


ditioning when conditioning was not required 
as in the Welch and Kubis (1947a, 1947b) 
instruction to and showing less con- 
ditioning when it is required as evidence of 


relax, 
problem solution; the reverse would be true of 
less anxious or focused subjects 

The reasoning underlying these predictions 
is that diffuse subjects, when instructed to 
the 
environment, are unable to meet this require- 


relax and not to attend to the events in 
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ment. Being diffuse, they are likely to be 
iffected by prominent stimuli. It is postulated 
that such subjects are likely to form arbitrary 
contiguity in 
absence of 


associations based on_ sheet 
time and space, with a 
such cognitive activity as a critical appraisal 
of their behavior or beliefs. On the other 
hand, when they are asked to attend to 
contiguities, as in the proposed instructions 
to solve a problem, subjects will evidence their 
diffusion by a relative inability to meet this 
requirement. It should be clear that under 
such instructions the contiguities in the situa- 
tion lose their quality of arbitrariness. It now 
requires more or less cognitive activity to 
select the appropriate contiguities, those 
which meet whatever criteria are set by the 


relative 


experimenter. It is assumed, of course, that all 
subjects are motivated to do what the experi 
menter asks, irrespective of any other motives 
which may be induced 

With respect to focused subjects, the ra 
tionale of the predictions is straightforward 
These subjects can meet task requirements 
best. When asked to relax, they do so, and as 
a consequence, resist the formation of as- 
sociations. Thus, their rate of conditioning 
should be slower under relaxed 
When asked to solve a problem, they focus on 
that, and faster rate. The 
conditioning in this case is taken as a reflec- 
tion of problem solution. 

Since there are no general laws that 
conditioning to various 


instructions 


condition at a 


relate 


rate of 
conditions, our 


absolute 
environmental! 
predictions must be 
statistical interactions between the principal 
independent variables: and 
problem orientation. 

Diffusion, which predisposes to irrelevant 
activity, may be brought about by a variety 
of methods. A potent one in past experimenta- 
tion has been the induction of self-centering 
through Thus, Niebuhr (1955) 
found less efficiency in a self-centered group 


experimental 
couched in terms of 


organismic state 


instructions 


as compared to a task centered group, in a 
reaction time experiment involving the meas- 
urement of muscle action potentials from both 
arms as one arm was occupied with the task, 
in a manner analogous to Luria’s (1932) 
studies. Shipley (1956) showed that 
centered and task centered subjects differed 


sel f- 


systematically in the amount of time spent 
on the inspection of pictures when they were 
asked to state a preference, as well as in 
some other respects. The self-centered sub- 
jects seemed less thoughtful and more im- 
pulsive. Weinberg (1958) was able to in- 
duce self-centering in children without threat 
by focusing them on the playful, personal 
aspects of the situation, while the task 
centered children were encouraged to focus 
more directly on the task in an atmosphere 
of cooperation with the experimenter. Task 
centered children appeared to show superior 
performance on two tests of conceptual 
ability. 

These experiments are interpreted as in- 
dicating that, in general, self-centering leads 
to a decrease in efficiency of behavior. In 
accord with the interpretations submitted 
above, the following experiments test the pre- 
diction that there will be a significant inter 
between a self-centering (SC)-other 
(OC) variable, induced by in- 


action 
centering ” 
also 


structions, and the task requirements, 
induced by instructions. The task require- 
ments were varied to require the subject to 
solve a problem (P orientation) on the one 


hand, or to relax (R orientation) and pay no 
attention to extraneous matters that may be 
occurring in the experimental room, on the 
Thus, the main experimental design 
involves four groups: Self-centered, relaxed 
(SC-R)—Subjects are told that the experi- 
menter is measuring how neurotic they are, 
and they are to relax; Self-centered, problem 
(SC-P) test, with instructions to 


other 


Neurotic 


confusion occur, attention is called to this 
change in terminology. In the experiments by Nie 
buhr (1955), Shipley and Weinberg (1958), 
this group was labeled “task centered.” The degree of 
task centering is the generic variable believed to be 
manipulated by these instructions. However, it seems 
worthwhile to differentiate the presumably manipu- 
lated variable from the instructions per se. Since the 
latter merely plead with the subject to do something 
for the experimenter’s sake, the term “other centered’ 
than the term previously 
other centering instruc 
tions affect the or atten 
tion to the task, as does Other cen 
tering and self-centering operate in opposite direc 
tions, however, the latter decreasing task centering, 
least, not de- 


Lest 


(1956). 


seems more appropriats 
used. It is assumed that thes« 
amount of task centering 


“self-centering.” 


and the former increasing it, or, at 


creasing it as much 
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try to solve a problem involving occurrence of 
two things together; Other centered, relaxed 
(OC-R)—Subjects are told success or failure 
of 5 years of the experimenter’s work depends 
on their following instructions, and they are 
to relax as above; and Other centered, prob- 
lem (OC-P)—Other centered and problem 
solution, respectively, as above. The inter- 
action predicted is that the difference between 
Groups SC-R and SC-P will be in a direction 
different from the homologous comparison of 
OC-R and OC-P. In other words, the R and 
P conditions will have differential effects on 
the SC and OC groups 


METHOD 


Experimental Procedures 


Apparatus. The subject was seated in a wooden 
chair with arms. GSR electrodes 
dipped in saline, set in cups with copper plates and 
rubber rims—were affixed to his right palm and 
right arm above the wrist. In front of him was a 
memory drum on which was a tape containing 159 
nonsense syllables, one of which appeared in an 
aperture, for 6 seconds. By means of a series of 
microswitches and relays, a loud bell, the UCS, was 
sounded on alternate presentations of one of these 
syllables. There was, thus, a regular 50% rein- 
forcement schedule as in the Welch and Kubis 
(1947a, 1947b) procedure. The UCS emanated from 
under the subject’s seat. The various time relations 
will be given below 

A continuous record of the subject’s GSRs was 
made in an adjacent room by means of a modern- 
ized version of the Darrow (1932) Behavior Re- 
search Photopolygraph, a sensitive instrument, which, 
at resistances of 33,000 ohms or less, yields percep- 
tible deviations of the galvanometer needle when 
the change in resistance is as low as 10 ohms 

The criteria for scoring a response as a conditioned 
response (CR) (a) The response must begin 
at least 1 second after the onset of the CS; (bd) 
The response must begin no later than .5 second 
after the onset of the succeeding stimulus; (c) The 
response must be equal to or larger than the mean 
of all to any neutral syllables during 
acquisition. 

Recordings of muscle action potentials from the 
forehead and the neck were also made in an 
adjacent room for exploratory purposes 


sponge rubber, 


were 


resp ynses 


Instructions 


Half of the subjects were told that we were about 
to measure how neurotic they were and half were 
told that on their cooperation depends the success 
or failure of 5 years of our work. We shall call the 
first the self-centered group (SC) and the second 
the other centered group (OC) group. Half of the 
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how 


could 


could measure 


well they 


SC group was told that we 
neurotic they were by 
relax, that they were to 
which might occur in the room and 
only signals between ths 
in the room with the subject and one in an adjacent 
room), that they were to read the 
which would appear in the aperture of the memory 
drum by spelling them out aloud as as they 
appeared, that we were asking them to do this so 
that they would not fall asleep over the course of 
the experiment which would last about a half hour 
The other half of the SC group was told that we 
could measure how neurotic they were by seeing 
how well they solved a problem. The problem would 
involve two things occurring together in some 
within the room, that they were to 
these two things were and how 
gether, but that they were not to 
solution of the problem until after the 
was over. Similar instructions were given to the OC 
Half were told that they could help us out 
and half 


best by 


seeing how 


attention to noises 


which 


pay no 
were 
(one 


two experimenters 


nonsense syllables 


soon 


wa) 
discover what 

occurred to 
their 


experiment 


they 


t ; report 


group 
best by relaxing and so forth, as 
told that they help us out 
solving a problem which would involve two things 
occurring together in way. All instructions 
were administered by the author 


above; 
were could 


some 


{dditional Measures 


Somewhat post-experimental inter- 
} 


views were conducted with each subj These were 


standardized 
subject’ ling F mut the 
degree of into the 
which had 
after the purposes ol 


ject, the 


elicit the 


instructions and the 
nature of the situation to 
been subjected. In addition, 
the experiment were explained to the sul 
50-item Taylor Manifest Anxiety (fA) scale and 
the Porteus Maze Test were administered. However 
because of the unavailability of almost half of the 
subjects for the administration of the Porteus, the 
this test were not analyzed 


designed to 
insight 


the subject 


results for 


Design and Method 


Subjects were assigned at random to the four 
groups: SC-R, SC-P, OC-R, and OC-P. Since a 
pilot experiment had indicated the possibility of a 
significant influence of sex, it was included as an 
additional independent variable. There 
jects in all, 10 in each group, 5 males and 5 females 
All subjects were undergraduate students in an 
introductory psychology course at the University of 
Pennsylvania 
Prior to exposing the nonsense sylla 
of sets of asterisks, usually three, in a 
enly two, were shown in the aperture of the memory 
function- 
nonsense 
view for 6 seconds, followed 
The CS, XAV, ap- 
follows: CS, 
neutral syl- 
UCS, two 


were 40 sub 


a number 
few cases 


bles, 


drum in order to adapt the subject to the 
the drum. Following these, the first 
remained in 
nhew one 
intervals as 
UCS, two 


syllables, CS; 


ing of 
syllable 
immediately by a 
predetermined 
syllable, CS; 
three 


peared at 
one neutral 


lables, CS, neutral 
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neutral syllables, CS; etc. This pattern was adopted 


in an attempt to measure differences in the degree 
of insight subjects might However, the data 
obtained on this reliable and 
were not analyzed. On the first presentation of the 
CS, after it had been in the window for 3.5 sec- 
onds, a loud under the 
sounded 


seem 


aspect did not 


bell, situated 


ige, was 


relatively 
without his knowle< 
syllable remained in the 
more The entire procedure 
was automatic, although there were several instances 
where defective caused the bell to be 
delayed ll subjects were included where the bell 
sounded while the CS was still in the memory 
window ; excluded. An 
inclusion of 
(UCR) 


included who gave 


subject’s chair 
for 1 second, and then the 
window for 1.5 seconds 


operation 


drum otherwise, they were 
riterion for 


onditioned 


additional and important « 
any subject was the um 
pattern. Only those subjects wer: 
detectable UCRs to all first four presentations 
of the UCS, or ve § yf the first 10 UCS 
presentations, or gave a tot: 10 UCRs of the 25 
possible? Altogether, there were 25 presentations of 
the UCS and 50 presentations of the CS. This was 
followed by 25 presentations of the CS without 
the UCS in an attempted extinction procedure 


response 


RESULTS AND DISCUSSION 


The data were analyzed in terms of the 
number of trials (each presentation of the CS 
is counted as tria!) mecessary for the 
establishment of a stable CR, defined as three 
successive CRs, to accord approximately * 
with Welch and Kubis (1947a, 1947b). Fifty- 


one 


one trials are possible, because with alternate 
presentations of the UCS, the first extinction 
trial may be viewed as part of the acquisition 


series, since the subject does not discover that 
the UCS does not appear until after he has 
had an opportunity to give a CR. Seven 
subjects did not meet this criterion: three in 
pilot experiment, it found that thé 
of many subjects disappeared after the first 
UCS (a Some of reached 
riterion of conditioning; most did not. Clearly, 
interpretation of conditioning in the ab 
sence of UCRs constitutes a serious problem. There- 
fore, these criteria for number of UCRs were adopted 
in the present study 
*Welch and Kubis (1947a, 1947b) 
criterion three successive CRs to 
Since there is ample opportunity for a CR to occur on 
present criterion 
seems sufficiently On the other hand, total 
number or percentage of CRs over a standard num- 
ber of trials, is in the Taylor (1951) eyeblink con- 
ditioning studies, is a poor dependent variable in this 
GSR marked 


effects 


was 


second buzzer) these 


speed of 


used as their 
unreinforced CSs 
reinforced CS presentations, the 


stringent 


type of conditioning because of the 


adaptation 
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the male SC-P group, two in the male OC-R 
group, and two in the female OC-R group. 
These subjects were assigned scores of 51, 
as the best estimate of the minimum number 
of trials to conditioning. It should be apparent 
that these constitute conservative estimates 
from the viewpoint of the main hypothesis, 
which involves the prediction of relatively 
conditioning for the OC-R and SC-P 
groups. All of the assigned were in 
these groups. 

The principal results are illustrated in 
Figures 1 and 2. Results of an analysis of 
variance of these data are shown in Table 1 

Also shown in Table 1 are the results of an 
analysis of variance of trials to a criterion of 
two successive CRs. Although this differs from 
the criterion utilized by Welch and Kubis 
(1947a, 1947b), it obviated the necessity for 
assigning arbitrary, even if conservative," 
scores to any subject, since all subjects met 
this criterion. 

It is clear from Table 1 that the main hy- 
pothesis, predicting an interaction between 
SC-OC and R-P, is confirmed, whichever 
criterion is used. 

There were significant interactions between 
sex and SC-OC, for the three trial criterion, 
and sex and SC—OC, and R-P, respectively, 
for the two trial criterion, illustrated in 
Figures 1 and 2. It is clear that the females 
were behaving differently from the males, but 
the interpretation of this result is not obvious, 
and, in any case, is peripheral to our main 


slow 


scores 


5 Presumably, the within-groups variance would 
increase—along with the increase in mean differences, 
and the former might vitiate the latter. It should be 
noted, however, that the increase in variance would 
have to be quite large before F would be reduced to 
a level where p> .0S. F to be reduced 
to less than 4 of its present value for this to happen 
Thus, for example, assuming an increase of 40 trials 
in one of the “51” scores in the SC-P group, while 
all other “51” remain situation 
which greatly increases the error variance without 
much compensation in the mean differences, F is 
still 8.81, p still <.01. If there were a tendency for 
these seven values to increase together, this would in- 
crease the variance at a slower rate, and F would 
remain significant for large increases in the individual 
observations. It should be noted that this problem 
does not arise in the analysis to a criterion of two 
trials, since a!l subjects achieved this criterion, one 
indication, among others, that these subjects really 
were conditioned in some sense 


would need 


constant, a 


scores 
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© OTHER-CENTERED 
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—— FEMALE 


SEX ; 
INTERACTION 


SC-OC xR-P 
INTERACTION 


’ 
’ 


35 


30 - 


TRIALS TO 3 SUCCESSIVE CR 


RELAX PROBLEM RELAX PROBLEM 


ractions in terms of mean number of trials 
to three successive CRs 


Fic. 1. Inte 


interests here. There are some indications 
that, among other things, some of the experi- 
menter’s characteristics and his manner of 
administering the instructions may of 
critical significance, but it would require care- 
ful sampling of both the experimenters and 
the subjects to test this hypothesis. 

With the three trial criterion, the SC-R 
group conditioned about five times as rapidly 


be 


as the OC-R group (¢ = 3.08, p < .01), and 
the OC-P group conditioned about three times 
as rapidly as the SC-P (¢ = 1.83 
.10 < p> .05). These data seem quite com- 
parable to those of Welch and Kubis (1947a), 
who found that their normals conditioned in 
approximately 7 trials and the patients in 
21. With the two trial criterion, the SC-R 
group conditioned almost five times as rapidly 
as the OC-R group (¢ = 2.76, p < .01), and 
the OC-P group conditioned about five times 
faster than the SC-P group (¢ = 2.07, p 
.05). Thus, by means of an analysis of 
task requirements, it was possible to reproduce 
data based “natural” variables unde 
relatively more controlled conditions 

This is not to argue that the mechanism by 
which the SC-OC instructions work their ef- 
fects is perfectly clear. Nevertheless, various 
qualitative observations make it appear likely 
that the interpretation in terms of attention 
to task requirements is at least tenable. For 
example, OC subjects appeared to have a 
better qualitative grasp of the nature of the 
problem in the P condition. On the other 
hand, OC subjects in the R condition reported 
delays in associating, or even seeking to as- 
sociate, the CS and UCS cognitively 


group 


on 


observa 


Perhaps the clearest qualicative 
tions arising out of the post-experimental 
the differences in 


interviews were gross be- 


havior between the OC 
both experiments. In response to the experi- 
menter’s initial inquiry as to how things had 


gone, the SC subjects generally very 


SC and groups in 


gave 


TABLE 1 


ANALYSIS OF VARIANCE 


SC-OC 

R-P 

Sex 

SC-OC X R-P 
SC-OC X Sex 

R-P X Sex 

SC-OC X R-P X Sex 
Within 


*~> 
*o< 
9 <. 
etn < 


OF 


[RIALS TO CRITERION 


us 


34 34.2 
99 99.2 
15.¢ 15.6 
1703 1703.0 
— 1177.2 1177.2 
2.34* 801 801.0 
2.28" 3 3.0 


4335 


12.109**** 


135.5 
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SELF - CENTERED 
OTHER -CENTEREC 
MALE 

FEMALE 


Oo -— 4 


RELAX PROBLEM RELAX PROBLEM 


T 


Interactions in terms of mean number of trials 


to two successive CRs 


short, sometimes gruff, replies. Throughout 
the interview SC subjects volunteered little 
information. Every item of interest had to be 
extracted separately. Generally, they claimed 
not to have believed that this could be a test 
While some gave a clinical im- 
resentment toward the experi- 
menter, none expressed this verbally. How- 
ever, after the experimenter offered an expla- 
nation of the experiment, the SC subjects 
relaxed preceptibly, much more 
verbal, and most lost their sullenness. A few 
even admitted to having been concerned about 
whether they were neurotic 


for neurosis 
pression of 


became 


OC subjects, on the other hand, appeared 
to behave differently. They tended to be quite 
verbal, volunteered all kinds of information 


about their feelings, expressed overt concern 
about the quality of their performance, and 
admitted to a good deal of anxiety over the 
possibility of having spoiled anything for 
the experimenter. Most thought the experi- 
ment interesting despite the boring task set 
for the subject, but could verbalize their 
boredom during parts of the experiment. 
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Only one subject, in the male OC-R group, 
failed to mention the CS-UCS relation during 
the post-experimental interview. Thus, the 
differential conditioning rates do not seem due 
to differential awareness. Indeed, the one sub- 
ject who could not verbalize this relation 
conditioned in one trial. 

To interpret the results as a whole in terms 
of drive theory, a combination of the argu- 
ments utilized by Taylor (1951) and by 
Farber and Spence (1953) might be applied 
There appear to be three possibilities of vary- 
ing tenability: 

1. Peculiarities in the instructions produced 
higher drive in the SC group in the R con- 
dition and in the OC group in the P condition 
One can neither prove nor disprove such an 
assertion without having independent meas- 
ures of drive. Nevertheless, there is no posi- 
tive support in any of the ancillary observa- 
tions made for such an argument. 

2. The SC group had higher drive in both 
conditions, but more competing 
were activated by the P instructions. Such an 
assertion implies the assumption of a kind 
of cognitive activity on the part of the sub- 
ject which drive theorists seem to prefer to 
avoid. Even if taken seriously, however, this 
argument seems contradicted by the overt 
behavior of the subjects during the post- 
experimental interview. The SC subjects, on 
the whole, showed few manifest disturbances 
tend to deny concern over the SC instructions 
until the experimenter confessed that he was 
not really measuring neurosis, and they did 
not show systematically higher GSRs to the 
neutral stimuli in the experiment. The OC 
subjects, on the other hand, tended to display 
overt concern over their performance, often in- 
quired if they had spoiled anything for the 
experimenter, and generally offered to help 
further in any way they could. 

3. The OC subjects had higher drive in all 
conditions, but the R instructions, 
they must have seemed contradicted by the 
nature of the CS-UCS 
tended to arouse more competing responses 
Again, even conceding the possibility of this 
type of theorizing within drive theory, the 
qualitative observations do not seem to sup 
port such an assertion. It is true that the 
OC subjects showed more overt concern, dur 


responses 


insofar as 


objective relation 
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ing the post-experimental interview, over their 
performance during the experiment. However, 
the OC-R group did not offer more hypotheses 
as to the nature of the experiment than the 
SC-R group. On the contrary, the SC-R 
subjects seemed more suspicious in this re- 
spect, often proposing a number of possible 
relations between the stimuli, including some 
bizarrely magical ones. 

Finally, as can be seen in Table 2, there 
were no significant correlations between speed 
of conditioning and Taylor MA scale scores. 
Neither were there systematic differences 
among the groups in Taylor MA scale scores. 
Of course, it must be remembered that the 
Taylor scale was administered after the 
nature of the experiment had been explained, 
and the lack of correlation cannot be con- 
sidered contradictory of Taylor’s previous 
findings on these grounds. 

The interpretation preferred 
terms of ability to focus on task requirements 
Those subjects who are told that their degree 
of neurosis is being measured tend to have 
some significant portion of their attention 
diverted from the task at hand. To the extent 
that they accept the idea that the experi- 
menter might really detect their neurotic 
tendencies, they may become generally suspi- 
cious: Thus, when instructed to relax, they 
notice quickly the CS-UCS relationship, al- 
though they have been told almost explicitly 
to pay no attention to it. When they are 
instructed to solve a problem, on the other 
hand, they may suspect the CS-UCS relation 
as being too simple. Stable CR acquisition, 
insofar as it reflects such perception, is there- 
fore delayed. 

Those subjects who are told that the suc- 


here is in 


TABLE 2 


SPEARMAN RANK-CORRELATION COEFFICIENTS 
BETWEEN SPEED OF CONDITIONING 
AND TAYLOR SCALE SCORES 


Three trial 
Group ; cri yn 


SC-R 
SC-P 
OC-R 
OCc-P 
Combined 
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cess or failure of 5 years of the experimenter’s 
work depends on their performance are not 
necessarily less ‘‘anxious” than the others. 
Indeed, their overt tension levels seem higher, 
as indicated by the amount of verbalization 
and muscular movement. It appears, 
rather, that it is the direction of their at- 
tention that has been manipulated. The 
energies of these subjects are mobilized in 
the service of the experimenter. When asked 
to relax, they do, and CR acquisition is 
thereby delayed; when asked to solve a prob- 
lem, they see the solution rapidly, and CR 
acquisition is thereby hastened. In this sense, 
the OC subjects are focused and meet the 
task requirements more rapidly than the SC 
subjects, who are diffuse with respect to the 


gross 


task requirements 

This interpretation not only generated the 
prediction of a surprising result, but also 
finds no particular difficulty in accounting for 
the higher rate of conditioning of psychotics 
in the Spence and Taylor study, 
where drive theory, admittedly, cannot ac- 
count for the results. In terms of the con- 
cept of efficiency, it seems reasonable for 
psychotics, generally conceded to be high in 
psychopathology, to show much conditioning 
in situations where the cognitive task require- 
ments are unclear. At the same time, the 
very unclarity of the requirements for the 
subject in eyeblink conditioning studies makes 
the general interpretation of those data more 
difficult from the present standpoint. 

Several directions for further study seem 
indicated by these experiments. One of the 


(1953) 


primary goals of this series of studies is to 
help formulate a dimension of intensity of 
psychopathology in terms of the degree to 
which one meets environmental requirements. 
The present study constitutes an attempt to 
manipulate the subject’s ability to meet task 


requirements. Will psychopathological sub- 
jects demonstrate behavior similar to our SC 
subjects? If so, it would support the central 
role of the concepts “diffuse” and “focused” 
in the analysis of behavior. Might it also 
throw light on the role of narcissism, as an 
analog of self-centeredness (or vice versa), in 
creating diffusion with respect to task require- 
ments in psychopathology? Can the influence 
of self-centering be shown in phenomena other 
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than efficiency of action in reaction time 
(Niebuhr, 1955) and conditioning? It may be 
of some interest that an efficiency factor has 
been account for 
the intercorrelations among scores on the 
Rorschach (Wishner, 1959). Several 
studies are under to test the limits 


postulated to some of 
test 
now way 


of these concepts 


SUMMARY 


Previous findings by Welch and Kubis 
(1947a, 1947b) were interpreted to reflect 
inferior meeting of task requirements by 
anxious subjects their 
over normals in speed of GSR conditioning. 
The specific question raised in these experi- 
ments relative standing of 
two groups in speed of conditioning could be 
inverted as a function of the task set for the 
subject, even though the objective nature of 
the situation remained constant 

In both experiments, four main groups were 
used: self-centered (““‘We are going to measure 
relaxed orientation 


despite superiority 


was whether the 


how neurotic you are.”) 
(“‘We can do this by seeing how well you can 
relax 

pay no 
occur. gf 
tion (“‘We can do this by seeing how well you 


just spell the syllables aloud . 
that may 
self-centered—problem orienta- 


attention to any noises 


can solve a problem that has to do with the 


occurrence of two things together. . . .”); 
other centered (‘‘The success or failure of 
5 years of my work depends on . how well 
you follow my instructions.’’)-relaxed orienta- 
tion; and other centered-problem orientation. 
There were 10 subjects in each group, 5 males 
and 5 females. 

It was predicted that there 
interaction between the SC-OC instructions 
and the R-P orientations, that the SC 
groups would tend to condition relatively 
more rapidly than the OC group under the 
R orientation than under the P orientation 


The prediction was confirmed on the basis of 


would be an 


such 


AND 


Trask EFFIcreNncy 


an analysis of variance. Qualitative observa- 
tions tended to support an interpretation of 
the data in terms of the differential efficiency 
of these groups in meeting task requirements 
Significant interactions with sex were 
found. It was contended that these 
cannot be made to accord with drive theory 
very readily. 


also 
results 
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PERFORMANCE 


EXPECTANCY AS A DETERMINANT 


OF ACTUAL PERFORMANCE 


ELLIOT 
Harvard 


theorists have 
need for 


several 
have a 


In recent years, 
suggested that individuals 
cognitive consistency. Typical of these con- 
cepts is Festinger’s theory of cognitive dis- 
Festinger & Aron- 


i 


sonance (Festinger, 195 


son, 1960). According to the theory, when 
a person holds two ideas which are psycho 


logically inconsistent (dissonant) he experi 
ences discomfort and attempts to reduce the 
dissonance. The most common method of 
reducing dissonance is to change or distort 
one or both of the cognitions, making them 
more consistent (consonant) with each other 

Virtually all of the experiments which have 
been conducted to test these theories contain 
a basic but implicit assumption: that a person 
sees himself as good, honest, intelligent, and 
rational and consequently expects to behave 
in a good, honest, intelligent, and rational 
manner. In effect, these experiments have 
contained the tacit assumption that individ- 
uals have a high or “good” self-relevant per- 
formance expectancy. The existence of this 
assumption can best be understood by con- 
sidering a typical experiment. 

Ehrlich, Guttman, Schonbach, and Mills 
(1957) predicted that people who had re- 
cently purchased a new car subsequently 
would read more advertisements about that 
make of car than about any other make of 
car. That is, by seeking positive information 
about the car they had just purchased, they 
could reduce the dissonance that might be 
introduced by the few negative qualities of 
the car. But this hypothesis involves the 
tacit assumption that these individuals con- 
sidered themselves to be intelligent, rational 
people, and, thus, expected to purchase a 

1 This research was partially supported by a grant 
from the National Institute of Mental Health (M 
4387) and by a grant from the National 
Foundation (NSF G-16838), both administered by 
the senior author. This research was conducted whil« 
the junior author was on the tenure of a National 
Science Foundation fellowship 

2 Now at the University of Minnesota 
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superior cat individual had 


learned from long hard experience that he is 


Suppose an 


the type of person who never does anything 
right. Would negative ‘aspects of his newly- 
purchased car be dissonant with his self-con- 
cept? Hardly. In fact, one might predict that 
such a person would experience dissonance if 
his car 
expose himself to ads describing cars which he 
Indeed, a few of the 


were superior and hence he might 


had not purchased 
subjects in the experiment did just that. It is 
highly speculative but perhaps useful to sug 
gest that these subjects may have had a gen 
erally self-concept, or at least a 
negative performance expectancy 


negative 
regarding 
their ability to purchase a superior car 

To generalize from this example, it is sug 
that it value to make 
explicit the role of a person’s self concept in 


gested would be of 


the arousal of dissonance. In most situations 
dissonance actually involves a cognition about 
the self. Thus, stating that dis- 


sonance exists between two inconsistent cog- 


instead of 


nitions, 
dissonance exists 
the self (ie., a 
and 
is inconsistent 


it may be more useful to state that 
between a cognition about 
self-relevant performance 
expectancy ) 1 cognition about behavior 
which with this 


Events which coincide with self-relevant per 


expectancy 


formance expectancies are consonant, pleas 
ant, sought out; events which are discrepant 
from these expectancies are dissonant, un- 
pleasant, avoided, or minimized 

This formulation leads to the prediction 
that an individual who has a clear conception 
of his ability at a given task will experience 
dissonance if his behavior differs sharply from 
this expectancy. Thus, if 
do well and does poorly, he 
dissonance and attempt to 
performance. However, this is a rather trivial 


Since his performance was ob- 


a person expects to 
will experience 
minimize this 
prediction 
jectively poor, we need not appeal to the idea 
of inconsistency to account for his discomfort 
In our culture, people typically are rewarded 
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for doing well and punished for doing poorly 
On this would expect this 
person to minimize poor performance. But 
what happens to a person who is convinced 
that he is inept at a given task, and then 


basis alone, we 


suddenly discovers that his performance was 
excellent? Again, with the value placed on 
good performance in this culture, we would 
expect him to express feelings of pleasure and 


satisfaction. Yet, according to our formula 
tion, his excellent performance is inconsistent 
with his negative performance expectancy 
and, thus, a dis 
confirmed expectancy is a 
then we would predict that this person, con 
ceiving of himself as inept but performing 
well, will be uncomfortable with this superb 
behavioral measure should 
even if the person 


should cause discomfort. If 


powerful force 


performance. A 
reflect this discomfort, 
verbalizes satisfaction with the success 
evidence for this contention is supplied in 
an experiment by Deutsch and 
(1959). In a group task, some subjects were 
made to feel that they had performed well 
others were made to feel that they had per 
formed poorly. Their performance was then 
evaluated by a teammate. Subjects tended to 
be more favorably disposed toward a team 


Some 


Solomon 


mate whose evaluations were consistent with 
her own. 

In the present experiment, the theoretical 
tested by (a) 
an_ individual’ 


ideas discussed above were 
systematically manipulating 
expectancy concerning his ability on a given 
task and (4) systematically manipulating his 
performance so that it is either consistent or 
inconsistent with his performance expectancy 
The hypothesis is that a performance con- 
sistent with will be 
consonant (i.e., pleasant, acceptable); a per- 
inconsistent expectancy 


a person’s expectancy 
formance 
arouse 
pleasant, unaccepted ) 


with his 
will be un 


will dissonance (i.e 


PROCEDURI 


In general, the procedure involved (a) allowing 
task b) presenting some 
which led them to form 
regarding their 

while presenting subjects 

with information which led them to form an 
expectancy of poor performance regarding their 
skill on the task; (c) allowing some subjects to 
perform in a manner which was consistent with this 


subjects to 
subjects with information 


periorm a 
a high self-concept, or expectancy 


skill on the task, other 
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while allowing others to perform in a 
with their expectancy; (d) 
discomfort or displeasure 


expectancy 
manner inconsistent 
obtaining a measure of 
with this performance 


Subjects and Task 


The subjects were 40 female undergraduates who 
were paid volunteers for an experiment “on per- 
sonality.” The experimenter led the subjects to believe 
that he was interested in correlating interview-type 
personality tests with short answer, paper-and-pencil 
tests. The experimenter explained that he was in 
terested in finding a few quick tests which sup 
plied the same information about a person as inter 
views; he would then be able to save a great deal 
of time and effort by simply using these tests in 
lieu of the more cumbersome interviews. The experi 
menter told each subject that he would like her 
to take a few of the short tests during this session 
and that he would interview her (with her permis 
sion) at some later date 

As a warm-up, the experimenter administered a 
short self-rating scale. After the subject completed 
this test, the experimenter introduced the “next” test 
which was actually the last test that the subject 
was to take. This was a instrument which 
was introduced as an index of social sensitivity and 
valid and reliable test 


bogus 


was described as a highly 


This test has been widely used with remarkable 
psychologists for several years. More 
over, in my own work, thus far, it has proved to 
be the most useful of all the short, 
tests I've tried. It is an excellent measure of how 
sensitive an individual is to other people; ie., the 
subjects who score high on this test are the same 
people who, when interviewed, express a good deal 
of understanding and insight into other people 
Subjects who score low on this test, on the other 
hand, superficial under 


standing of other 


success by 


objec tive 


tend to express a very 


people when interviewed 


each card 
The experi 


The test consisted of 100 cards; on 
three photographs of young men 
that one of the photos on each 
The subjects were 
ability to judge 
schizophrenic 


were 
menter explained 
card was that of a schizophrenic 

told that the test measured their 
which of the young men was the 
They were told that they could use whatever cues 
they deemed relevant. The informed 
the subjects that some people do extremely well on 
5 correct; and 


experimenter 


this test, getting as many as 85% 
that some people perform very poorly, getting as 
few as 20% correct. The experimenter then reiterated 
high on this test 

when 


show very 


that people who score show a 
great interviewed, while 
people who score low little sensitivity 
Actually, there were no correct answers; the pictures 
were clipped randomly from an old Harvard year 
book—to the best of our none were 
chizophrenic 

The experimenter explained that it is very diffi 
cult for people to judge their performance on the 
test; that some people who think they do very 


deal of sensitivity 


knowledge, 
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poorly are among the best performers, and vice 
versa. 

The test was divided into five sections, with 20 
cards in each section, and with a 3-minute rest 
between each section. The experimenter informed the 
subject that this division was to allow the subject 
an opportunity to rest at intervals during the test. 
Actually, it was to afford the experimenter the op- 
portunity to feed the subject specific information 
about her performance on each section, in order to 
allow the subject to form a consistent performance 
expectancy. After the subject had completed each of 
the first four sections of the test, the experimenter 
pretended to score the subject’s performance by 
comparing her responses with an answer key. The 
experimenter then reported a false prearranged 
score to her. At the end of the fifth section of 
the test, the experimenter handed the subject an 
answer key and allowed her to score her own 
performance, in order to allay any suspicions 
the subject might have concerning the veracity of 
the reported scores. Actually this score was also 
false; the experimenter had recorded the subject’s 
responses in such a manner that, even when the 
subject scored her own performance, she would 
receive a prearranged score. After scoring the exam, 
most subjects whose earlier performance had been 
disconfirmed asked the experimenter to check the 
accuracy of their scoring. The experimenter did this 
and assured them that their scoring was accurate 

The experimenter administered the test by holding 
each card up until the subject made her choice. 
He then flipped over the card, recorded her re- 
sponse, and exposed the next card. In order to 
limit the length of time each subject was exposed 
to the cards, the experimenter informed the subject 
that she must make her selection within 10 seconds. 


Experimental Conditions 


The subjects were randomly assigned to one of 
four experimental conditions. One half of the subjects 
were given information about their performance 
on the first four sections of the test which led them 
to form a high performance expectancy; the other 
half were given information which led them to form 
a low performance expectancy. Specifically, the highs 
were given scores of 17, 16, 16, and 17, while the 
lows were given scores of 5, 4, 4, and 5. Then, on 
the fifth section of the test, one half of each group 
received a score of 17 while the others received a 
score of 5. Thus, on the fifth section of the test, 
(a) 10 subjects “performed” in a manner which 
was consistent with a high expectancy (High-High) ; 
(b) 10 subjects “performed” in a manner which was 
inconsistent with a high expectancy (High-Low) ; 
(c) 10 subjects “performed” in a manner which 
was consistent with a low expectancy (Low-Low) ; 
(d) 10 subjects “performed” in a manner which was 
inconsistent with a low expectancy (Low-High). 


Dependent Variable 


to her performance on 


The subject’s reaction 
was measured by 


the fifth section of the test 
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allowing her to respond to the identical section of 
the test again. The experimenter could then observe 
how many of her previous responses she changed; 
the number of changed served as an 
operational definition of the subject’s discomfort with 
her performance on the fifth section of the test. 

This operation was accomplished in the following 
manner. After the subject completed the fifth section 
of the test (but before the test had been scored), 
the experimenter pretended to be quite chagrined 
In response to the subject’s inquiry, the experimenter 
informed the subject that he was supposed to 
time her speed of performance but had neglected to 
do so on the fifth section of the test. He then 
asked the subject to score her own performance 
while he ruminated in an attempt to decide what 
to do about his omission. After the subject reported 
her score, the experimenter recorded it and informed 
her that he absolutely needed a measure of her 
time in order to complete his records. 


responses 


There’s only one thing to do. Would you mind 
terribly if I asked you to take the fifth section 
of the test over again? Why don’t you just 
pretend that it’s a completely new set of pictures; 
ie., respond as if you've never seen the pictures 
before—that way I can get a fairly accurate 
measure of the time it takes you to complete the 


set 


After the subject completed her task, the experi- 
menter explained the true purpose of the experiment 
and discussed the necessity for the deception. None 
of the subjects had suspected the purpose of the 
experiment but none expressed any resentment at 
having been deceived. On the contrary, most of 
the subjects expressed a good deal of interest in 
the design and questioned the experimenter at 
length regarding several of the methodological 
details. 


RESULTS AND DISUSSION 


As mentioned above, the dependent variable 
used was the number of choices which were 
changed on the repeat performance of the 
fifth test. This measure should reflect ac- 
curately the amount of comfort or satisfaction 
with the previous performance. It is obvious 
that changing no responses wi!] guarantee an 
identical performance. Changing a large num- 
ber of responses will guarantee a low score if 
the previous score was high, and will virtually 
guarantee a higher performance if the previ- 
ous score was very low. 

Table 1 shows the mean number of re- 
sponses changed in each of the four condi- 
tions. We may consider the High-High con- 
dition as a kind of baseline; subjects in this 
condition should have little pressure to change 
their responses, since their performance was 
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rABLE 1 


NUMBER OF RESPONSES CHANGED 
ON REPEAT PERFORMANCE 


excellent and conformed with their expect- 
ancy. In fact, they changed an average of 3.9 
responses, which we may attribute to faulty 
memory or an attempt to change one or two 
responses which they had thought were in- 
correct. In the High-Low condition, however, 
the subjects changed an average of 11.1 re- 
sponses. We would expect the largest number 
of changes in this condition, since both vari- 
ables which might be expected to produce 
changes were operating here. That is, the 
performance was both objectively bad and 
inconsistent with their expectancy. And in- 
deed, the number of responses changed was 
highest in this condition. 

It is the difference between the other two 
conditions which is most interesting for the 
hypothesis proposed here, however. In the 
Low-High condition, although the perform- 
ance was objectively excellent, it was in con- 
flict with the subject’s performance expect- 
ancy, whereas the Low-Low condition pro- 
vided a performance which was objectively 
poor, but in complete agreement with the 
expectancy. The results provide clear support 
for the hypothesis. Subjects in the Low-High 
changed an average of 10.2 re- 
sponses; the mean change in the Low-Low 
condition was 6.7. This difference is highly 
significant (p < .01, Mann-Whitney U test). 
If we interpret the number of responses 
changed as a measure of dissatisfaction with 
performance, it seems clear that subjects 
whose performance was in conflict with their 


condition 


performance expectancy were less satisfied 
with this performance than subjects whose 
performance was in harmony with their ex- 
pectancy. This was true even though the ob- 
jective performance was far superior for the 


former group. 

Table 2 presents the analysis of variance 
Since the variances were 
conditions, an ap- 


for these differences 


different in the various 


proximation suggested by Smith (1936) and 
Satterthwaite (1946) was used. This ap- 
proximation reduces the df in the F tests from 
36 to 26. The analysis of variance shows some 
effect due to the (reported) performance on 
the fifth test. The subjects who were told that 
they had done poorly changed more re- 
sponses than the subjects who were told they 
had done well (F = 8.3, p< .01). This re- 
flects a general desire to do well regardless of 
expectancy. This desire was apparent in the 
behavior of the subjects on the first trial. 
Those subjects who performed well were 
overtly pleased, those who performed poorly 
manifested discomfort. 

The strongest effect, however, is clearly the 
interaction. Subjects whose performance was 
consistent with their expectancies (High- 
High and Low-Low) changed fewer responses 
on the repeat performance than subjects 
whose performance was inconsistent with their 
expectancies (F = 69.8, p < .001). This re- 
flects the drive to confirm a self-relevant per- 
formance expectancy regardless of whether 
the expectancy concerns a positive or negative 
event. 

One possible alternative explanation of 
these results is that the subjects, having been 
told that this was a reliable test, were trying 
to do the experimenter a favor by making 
their performance seem more reliable. Al- 
though we cannot completely reject such an 
explanation, we can give some informal evi- 
dence which seems to us compelling. When 
the true nature of the experiment was re- 
vealed, almost all of the subjects refused to 
believe the hypothesis. In fact, when the sub- 
jects in the Low-High condition who had 
changed a significant number of responses 
were informed of this, most expressed surprise 
upon learning that they had changed so 


rABLE 2 


ANALYSIS OF VARIANC! 


source 


rotal 

l'reatments 109.8 
Expectancy 90 
Fifth trial 34.2 
Interaction 286.2 

Error 4.10 
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either to 
criteria of 


changes 
shifting 


many. They attributed 
faulty memory or to 
judgment. 

Although the results were predicted from 
an extension of dissonance theory, these data 
also support assumptions contained in several 
other theories. For example, Lecky’s (1945) 
theory of self-consistency clearly predicts 
such a result. Similarly, Tolman’s (1959) 
notion that disconfirmed expectancies are 
unpleasant is consistent with these data. Im- 
plicit in Kelly’s (1955) theory of personal 
constructs is the assumption that predict 
able behavior is desirable. In addition, 
clinical observations such as Freud’s descrip- 
tion of the repetition compulsion and Mowrer’s 
(1950) concept of the neurotic paradox could 
be interpreted as being consistent with these 
data. 

Since the result is predicted by all of these 
theoretical approaches, it is curious that there 
has been an absence of clear experimental 
demonstrations of this effect. That is, al- 
though several experiments demonstrate the 
existence of negative affect following the 
disconfirmation of a positive expectancy, it 
is more difficult to demonstrate the existence 
of negative affect following the disconfirma- 
tion of a megative expectancy. For example, 
Tinklepaugh (1928) demonstrated that mon- 
keys became quite upset when they expected 
to find a banana under a cup and found a 
lettuce leaf instead—even though monkeys 
normally like lettuce. However, monkeys do 
prefer bananas to lettuce leaves; when Tinkle- 
paugh attempted to reverse the conditions, 
the effect did not appear. 

It should be noted that the present experi- 
ment did not demonstrate the presence of 
negative affect following a disconfirmed nega- 
tive expectancy; it showed only that following 
such a disconfirmation, subjects will take 
steps designed to reaffirm a negative per- 
formance expectancy. In a subsequent experi- 
ment (Carlsmith & Aronson, 1961), evidence 
is presented to show that such a disconfirma- 
tion does lead to negative affect. 


SUMMARY 


Theories of cognitive consistency were ex 
tended to account for individual differences in 
self-relevant expectancies. This extension led 
to the following prediction: if a person expects 


AND J 
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to perform poorly in a particular endeavor, a 
inconsistent with 
reduce 


good performance will be 
his expectancy; he will attempt to 
dissonance by denying this performance 

In a laboratory experiment, some subjects 
were led to expect to perform a task ex- 
cellently—others, They then per- 
formed the task, were given false scores which 
confirmed or disconfirmed their ex- 
pectancies, and were surreptitiously allowed 
to change their responses on the task. The 
subjects who were given information which 
was inconsistent with their performance ex- 
pectancies changed significantly more of their 
responses than those who given con- 
sistent information. Thus, subjects who ex- 
pected to perform poorly but performed well 
exhibited more discomfort (changed more 
responses) than subjects who expected to 


poorly. 


either 


were 


perform poorly and did perform poorly. 
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AN APPLICATION OF 


“CLOZE” TECHNIQUE TO THE 


STUDY OF APHASIC SPEECH’ 


SAMUEI 


University 


The free speech of brain damaged aphasic 
patients may differ from speech of normal 
adults in a variety of ways. Further, within 
a group of aphasic patients there may be 
marked individual differences in speech. Pre 
liminary study indicates that several statis 
tical characteristics of distributions of parts 
of speech are sensitive to such individual 
differences, as are the ratings of a trained 
linguist (Fillenbaum, Jones, & Wepman, 
1961). An alternative approach to the study 
of speech and speech differences is provided 
by the technique (Taylor, 1953) 
Raters are required to guess the identity of 
words which have been deleted from a tran- 
script of speech; their relative success may 
be taken as an index of the predictability or 
redundancy of the text. This measure in- 
dicates the degree of correspondence 
of the source’s system of language habits 
including both semantic and grammatical 
habits—to those of other users of the same 
language” (Osgood, 1959, p. 37). 

Applying the cloze technique to the speech 
subjects 


“cloze” 


“in toto 


of aphasic patients and nonaphasi 


it may be determined whether the speech of 
a given aphasic patient is more or less pre- 
dictable than that of normals, and differences 
among aphasic patients in the predictability 
of their speech may also be investigated 
Here, at least two aspects of speech are of 
interest: form class predictability—the ex- 
tent to which raters supply words of the same 
grammatical class (part of speech) as the 
missing item, and verbatim predictability 

the extent to which raters 
the deleted item. Form class predictability 
measures the extent to 


rac fia» 


can supply 


which context al 

1 This study has been supported by Grant M-1849 
from the National Institute of Mental Health, Na 
tional Institutes of Health. We are indebted to S 
Das Gupta, W. Rice, and Elizabeth Niehl for help 
with the analysis of data 
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of North Carolina 


lows prediction of the kind of word deleted 
verbatim predictability measures the extent 
to which context permits a precisely correct 
completion. Given that the form class of an 
item is correctly selected it will be of interest 
to estimate the probability of correct selection 
of the specific deleted word. Individual dif- 
ferences among speakers might be anticipated 
for these conditional probabilities, as well as 
for form class predictability and verbatim 
predictability, taken separately 


METHOD 


transcripts of free 
Thematic 


analysis were 


cards of the 


Available 
speech elicited by the 20 
Apperception Test.* A sample of approximately 25( 
running words was drawn from the transcript of 
each of nine aphasic patients and three normal con 


trol subjects The samples were selected to include 
responses to the same TAT cards.* In each sample 
fifth word was deleted.* Three sets of booklets 

were constructed, four 
three 
latter 
appear in the booklet, are 

These booklets administered to 
two summer school psychology 
University of North Carolina, each of the 
raters being asked to complete one booklet 


every 
each compose d of speec h 
samples and one control 


Exc erpts from the 


samples, aphasic 


sample, the coming first 
texts, as they presented 
in Figure 1 were 
classes at the 
student 
Instruc 
tions were as follows 
transcripts of the aphasic patients 
I obtained, under the 
supervision of J. M. Wepman, at the Speech and 
Clinic of the University of Chicago 
Further information concerning the patients and 
subjects may be found in a previous paper (Fillen- 
baum, Jones, & Wepman, 1961) 

’ Due to differing speech productivity 
jects some samples included responses to a greater 


2The speech 


and control subjects were 


Language 


among sub 


number of cards 
‘If the fifth word was a neologism or vocal gesture 
immediately following was deleted 
represented by xxxx in the tran- 
and raters were told what the notation 
number of neologisms in the aphasic 
varied from 2 to 19, mean 


the word 
Neologisms wi 
script samples, 
signified. The 

peech samples with a 


of 8.5 
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Well this looks (1) a farmer, plough 
(4) school books in her (5). 


006 
(Control) 


spell I (8). And it looks as (9) they've been working at (10) quite a litt 


work (12) got done here. But (13) still 


This is a (1) I would say a (2) lovely 


105 
(Aphasic) 


back to the 


15 


would be her mother. (10 
(14) like as if the 


behind, and they’re (18) plowing 


to go to school. 


110 Well it looks (1) I’ve seen for (2 
(Aphasic) 


ah ah (9) has been xxxx (10) or maybe she’s going (11 


She's pretty (13) girl and all that. (14 

Well this is (1 
the man up there. ( 
back at her (9) xxxx, her boyfriend or (10 


112 
(Aphasic) 


> 


2 ’s; (4 


field; (3) daughter 
by; (12 (13 


(1) like; ( 
(10) this; (11 


he’s; 


(1) picture; (2) very; (3) young; (4) far 


Getting; (11) for; (12) say; (13 


mt; {S 


‘4 
looking; (14 


it; 


(1) what; (2 
and; (11) back; (12 


years: (3) di 
I; (13 
(2 


he r 


(1) already; that’s; (3) long; (4 


(10) her; (11 


Fic. 1. 


Excerpts from texts of aphasic and n 
responses to Card 


This is a study of language, in particular of 
the damaged language of people who have suf- 
fered from various sorts of accidents or strokes 
Your booklet has four pages in it. On each page 
there is a sample of speech from a different 
person. You will notice that every so often there 
is a blank; every fifth word which was spoken 
has been left out. Your job is to fill in the blank 
with the word you think was used there. Put 
in the word that you think will make the best 
sense. This is difficult, and often you will have 
to guess. Do the best you can, it is very important 
that you make a try at all the blanks. You will 
have 15 minutes for each page. Don't go on to 
the next page until I give you a signal 


Raters were urged to read over each sample be- 
fore filling any bianks. To be included in the 
analysis, it was required that a rater fill in at 
least three-quarters of the blanks on every sheet 
of his booklet. Thirty-eight raters met this criterion, 


There’s another woman standing (6 


(5), and a man stripped (6) to his waist, working (7 


at times but 


picked (5) I've drawn a picture (6) that in here and ( 


been unless he’s xxxx, 


(5) is going to set (6) with a, books under (7) each 


m: 5 horse: (6 


way; (14 
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Excerpt 


ing his (2), and ah evidently his (3) with a couple 
to the side, ah (7 


11) looks 
I could 


rere 


a breathing 


le while of the 
has a lot (14) do. Any more than (15 


girl, in her (3) age, xxxx age, on : 
the 


ment I 


fields 


girl (11 


in the country 


a me 


I (3) know the name of 
177 
s ‘ 


, I remember that was 
to school or something, ( 


juite so, worn out (15) she is, perhaps. 


(? 


(3) xxxx picture, the woman 


‘ 


all let's see, if, 


arm, each hand, (8) looking 


how did he get (11) husband 


he; (14 


down: (4 


Looks; (15) farm; (16) Or 


6 like; (7 


(15 


not; as 


rmal speech 


2 on the TAT 


13 completing Booklet 1, 11 completing Booklet 
and 14 completing Booklet 3 

In each of the 12 
deleted had been classified in terms of the following 
adverb, adjective, verb, noun 


. 
samples every word 


speech 
form classes: pronoun, 
other (preposition, article, conjunction the 
same system was followed for classifying completions 
made by the raters. Thus, for each attempted 
pletion in every record it was noted whether or 
not the guessed word was of the same form cl 
as the deleted item, well as whether 
guessed word was identical to the deleted item. The 
proportion of correct verbatim (V) 
and that of form FC 
were computed for each record 

Methods of data analysis were selected to answer 
several distinct experimental questions. First, for 
each of the three booklets, it is desired to determine 
whether respondents achieve appreciably different 
accuracy for the four different either for 
form class or verbatim completion consider 


or 
com 


ass 


as or not the 


completions 


correct class completions 


excerpts, 


We 
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excerpts fixed, but respondents randomly sampled. 
The hypothesis to be tested is that the two sets of 
accuracy (FC and V) in the population of 
respondents fail to discriminate among excerpts. An 
appropriate test is provided by multivariate analysis 
of variance, with the two completion scores serving 
dependent variates. (For detailed discus- 
the method used, see Jones, 


scores 


as two 
sion of 
1960). 

It will be noted the problem is 
simply as discriminating 
as a group, from normal control subjects 
single, general 
consider that 
presents a 


statistical 
that not ex- 
pressed one of aphasic 
subjects, 
We do not conceive of aphasia as a 
language disorder rather we 
language behavior of aphasic subjects 
variety of different kinds of difficulties (Jones & 
Wepman, 1961; Wepman & Jones, 1961; Wepman, 
Jones, Bock, & Van Pelt, 1960). The appropriate 
unit of description thus remains the individual 


but 


speaker rather than a collection of aphasic speakers 


discriminable, as shown by multi- 
variance, it mes of interest 
value of the two com- 
effecting discrimina- 
ways in which the 
by product 


If excerpts are 
variate analysis of 
to determine 
pletion scores, 
tion, and 
excerpts 
of multivariat 
found a discriminant 
combination of the two which maximally 
discriminates among the Examination of 
the weights assigned V and FC in this function, and 
examination of the mean “discriminant as- 
sociated with each excerpt will aid in the interpreta 


bec 
relative 
and V, for 
examine the 
discriminated. As a 
variance, there 
sarticular 


the 
FC 
also to 
are being 

analysis of may be 
function, a linear 
scores 


excerpts 
scores” 


tion of differences among speakers 

Finally, even if it is discovered that the two 
taken together do discriminate among ex- 
cerpts, it is of interest to determine whether each, 


scores 
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considered separately, provides such discrimination 
More specifically, is discrimination provided by FC 
scores after partialing out effects of V, and by V 
scores after partialing out of FC? Which 
dependent variable provides the better discrimina- 
tion? Analysis of covariance serves to provide 
evidence concerning these questions 


effects 


RESULTS AND DISCUSSION 


The mean values for V and FC, for each 
sample, are presented in Table 1, which also 
indicates for each sample the overall prob- 
ability of correctly selecting an item ver- 
batim, given that its form class has been cor- 
rectly recognized (V/FC). 

To assess differences in predictability of 
texts within each booklet multivariate anal- 
ysis of variance was utilized, based upon two 
dependent variables, V and FC. Two-way 
analyses were performed, with the major 
classifications represented by texts and raters 
By considering proportions of correct guesses 
for “odd” and “even” deletions, within-cell 
variance estimates were obtained. 

In each analysis, the hypothesis tested is 
that no linear combination of form class and 
verbatim prediction scores serves to discrim- 
inate reliably the four transcripts. This hy 
pothesis is rejected for each booklet (p < 
001), as seen from the results (Table 2) 
It is clear that cloze scores do discriminate 
among the texts; it becomes of interest, then, 


TABLE 1 
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( OMPLETIONS 
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Mean proy 
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RESULTS OF MULTIV 


to discover any relations between the 
scores and the origin of the texts 

Each multivariate analysis of variance may 
easily be extended in the form of a dis 
criminant analysis, yielding a discriminant 


function of the form 


a,\ a,(FC) 


where V,; and (FC),; are proportions of cor- 
rect predictions achieved by a given rater (;) 
on a given text (,;) for verbatim and form 
class accuracy, respectively; @, and are 
the coefficients which serve to maximize the 
ratio of the variance of Y;; between texts to 
the variance of Y;,; within texts. The coeffi- 
cients a, and a, for each booklet are given 
in Table 3. The values Y 
each text, appear as “discriminant scores”’ 
Table 1. 

From Table 1, is clear that the mean 
discriminant for texts from normal 
speakers tend to be larger than those from 
aphasic speakers. (The exception, Subject 
Number 105, will be discussed below.) Since 
all discriminant coefficients of Table 3 
positive, larger discriminant scores are 
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for 
in 


mean one 


it 


scores 


are 
in- 
dicative of more successful predictions; thus, 
raters are generally more successful in predict- 
ing missing words from normal than from 
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This tendency is evident 


both for verbatim and form class predictions 


aphasic transcripts 
as is seen from the proportions of successful 
guesses, Table 1. 

It should be 
guess represents 


if 
form cl: 


noted that 
the correct 


even 


probability that his verbatim prediction is 


correct is consistently less for an aphasic 


text than for a control text 


covariance, adjusting difference 


Analysis of 
verbatim 
correctness for differences in form class 
a significant F (p < < 
the 


in 
cor 
1) 


three 


rectness, yielded 
between transcripts for each of 
booklets; a similar analysis of 
adjusting differences in form class correctness 
for differences in verbatim correctness yielded 
significant F values (p < .001) between tran 
scripts for Booklets 1 and 2, while for Booklet 
3, 01 < p < .025. In terms of the 
evident not only that the two cloze 
allow for discrimination among texts, but also 
that each of the scores contributes independ- 
ently to such discrimination. 

While the results of the several analyses 
are generally consistent over the three replica 
tions (booklets) there are some 
particularly between the results for Booklet 3 
and those for the other booklets ata 
obtained from Booklet 3 the discriminant 
coefficient for correctness con 


covariance 


above t Is 


Scores 


differences 
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Table 3) 


results of the covariance analyses 


value which results from an 


covariance corre sponds to a correlation ratio, 

(see McNemar, 1955). While F is a 
useful for hypothesis testing, and for deciding 
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three 
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each of the booklets are 
Table 4 
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that for Booklet 3 the 
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than is the case 


for at least some 


texts 
Thus 
appears to be 


lass cloze 


discrimination among 
for Booklets 1 and 2 
speech less 


aphasi patients, 


discriminable on the basis of form « 
scores than on the basis of verbatim scores. 
Of the nine aphasic texts only the results 
one, Number 105, do not depart at all 
from normal—in 
successful in predicting speech for this text 
as for its control text (Number 006). There 
are other data available on the aphasic tran 


ior 


every way raters are as 


scripts (see Fillenbaum et al., 1961) and it 
is of considerable that Transcript 
Number 105 was placed very close to normal 
characterize 


interest 


when a linguist 
changes in the speech of aphasics, and that 


al tempted to 


in terms of a number of quantitative indices 
of form class usage, stereotypy in language 
and sequential dependencies in speech, this 
protocol falls within the normal range 

So far it has been observed that the speech 
of the aphasic patients tends to be less pre- 
dictable than that of the normal controls, 
and that even where the form class of an item 
can be correctly determined from the context 
the probability of selecting the item verbatim 
is still considerably less than that for normal 
One might ask, further, if there are 
differences among aphasic patients in the 
predictability of their speech and how these 


texts 


cloze data relate to other information on the 


aphasic speech transcripts, as reported by 
Fillenbaum et al. (1961) 

Texts Number 106 and 112 yield the lowest 
proportion of correct verbatim completions, 
and the ratio of 
form class correct completions. It might be 
noted that the linguist in ratings char- 
acterized both of these patients as suffering 
where disruption of 


lowest verbatim correct to 


his 


from pragmatic defects, 
the integrative processes in language formula- 
tion leads to speech which cannot be under 
stood even though phrases appear intact, and 
grammatical sequences differ little from nor- 
mal. Results with that 
characterization of the two patients’ speech. 


here are consistent 

In some respects the cloze scores permit 
discrimination among texts that do not differ 
in terms of other measures; less frequently 
cloze scores fail to discriminate among texts 
differ in terms of information 


terms of all information 


other 
other 


which 
Thus, in 
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(linguist’s ratings and quantitative indices 
of language use) Texts Number 101 and 105 
are considered to be very close to normal; 
in terms of cloze scores Number 105 does not 
depart from normal but the speech of Num- 
ber 101 is clearly less predictable and more 
difficult to reproduce than that of its normal 
control. On the other hand, there is little 
difference in cloze scores for Numbers 101 and 
103; yet their transcripts are quite different 
in terms of other statistical information. The 
above suggests that the cloze procedure, which 
here provides some index of the 
communality between the language of the 
aphasic patients and that of the normal raters, 
might be of value particularly when used in 
conjunction with other kinds of data. 

It may be suggested that the conditional 
proportion of correct verbatim completions, 
given correct form class completions (in this 
study, the ratio of the two proportions V/FC) 
should be a measure sensitive particularly to 
semantic selection processes or to change and 
damage in these. It will be recalled that for 
eight of the nine aphasic records this measure 
is lower than for the corresponding control 
value. In terms of information already avail- 
able (Fillenbaum et al., 1961) we know that 
five of these eight transcripts display evidence 
of considerable (semantic) difficulty in word 
selection, and that two of the other three 
transcripts also give evidence of some semantic 
difficulties along with defects of a pragmatic 
kind. Thus, the results obtained by use of 
cloze technique are consistent with those ob- 
tained in a quite different way. 

We have indicated that cloze measures can 
be used not only to index overall correspond- 
ence in language between a particular source 
(aphasic patient) and audience (normal sub- 
jects) but that different cloze measures taken 


overall 


5 Transcript samples from aphasic patients suf- 
fering primarily from syntactic difficulties, with dis- 
ruptions particularly in the grammatical syntactic 
sequencing processes of speech, could not be analyzed 
effectively by the cloze procedure. These protocols 
were so short that there was just not enough con- 
tinuous text available for any of the 20 TAT cards 
to permit deletion of every fifth item and still leave 
any sort of context; if the cloze procedure had been 
applied to samples from such transcripts completion 
scores would necessarily have been very low. 
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in conjunction may provide information as 
to the particular kind of change or damage to 
language. It should be obvious that other more 
differentiated uses of the cloze procedure are 
also possible. Given speech 
siderably longer than those used in this study 
one might examine the predictability of dif 
ferent form thus more 
precisely the locus of change or damage in 
speech. Given a sufficiently long transcript 
one might ask whether or not it becomes more 
predictable the reader better 
acquainted with the text; thus, the language 
in a text may be idiosyncratic as compared 
to normal speech and yet sufficiently 
sistent so that these idiosyncracies may be 
learned by the audience, or it may be idiosyn- 
cratic and inconsistent, precluding any sort 
of learning and, thus, any improvement in 


samples con- 


classes, specifying 


becomes 


as 


con- 


cloze scores 

At least in terms of this preliminary study 
it appears that the cloze procedure can pro 
vide valuable information not only on the 
overall from normal 
but also, more specifically, as to the pos- 
sible locus of disruption in encoding. While 
the procedure is obviously related to the in- 
formation analysis of redundancy in speech 
(Osgood, 1959) it differs from such an anal- 
ysis in that it utilizes as a criterion the actual 
message of the rather than the 
certainty of judgment of the audience. In 
some ways, this would seem to be an advan- 
tage for, after all, primary interest resides in 
characterizing the language of the speaker 
(aphasic patient) in terms of its departure 
from normal. 


divergence of speech 


speaker 


SUMMARY 


The “cloze”’ procedure which requires raters 
to fill in words deleted from a transcript and 
provides a measure of the predictability of 
speech, reflecting the 
speaker’s system of language habits to those 
of his audience, was used to study the speech 
of aphasic patients. Two measures were ob- 
tained indicating verbatim com- 


correspondence of the 


success in 


pletion and success in form class completion, 


and the relation between these measures was 
also examined. 
The cloze scores discriminated among tran- 
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scripts from aphasic and control speakers, each 
of the scores contributing independently and 
significantly to such discrimination. The 
aphasic speech samples typically were less 
accurately than normal control 
samples, whichever measure was used; and 
given that the form class of an item had been 
correctly identified raters were still less likely 
to hit upon the exact missing word for the 
aphasic texts than for the normal control 
texts. The results are compatible with other 
data already available on these patients; for 
example, the difficulty raters have in exact 
item identification, even given correct form 
class identification, is consistent with knowl- 
edge that most of these patients suffer from 
semantic selection difficulties in word finding. 

It was noted that the cloze procedure can 
provide an overall index of the divergence of 
a speaker’s language from normal. Also, in 
texts representing speech from aphasic pa- 
tients, the nature of the disruptions in speech 
might explored by distinguishing cloze 
scores for various form classes and consider- 


completed 


be 


ing relations among them. 
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There has been a multiplicity of studies 
of the relationships between individual per- 
sonality and behavior in small groups (Mann 
1959), but there has been little focus on the 
interaction of personality patterns among 
members as determinants of group produc- 
tivity. 

A few studies have been done in this 
Tuma (1955) found that dissimilarity b 
tween therapist and client on the CPI 
Dominance scale correlated highly with rated 
improvement in a counseling situation regard 
less of the direction of the dissimilarity. Other 
investigators (Cleveland & Fisher, 1957; 
Winch, Ktsanes, & Ktsanes, 1955) have also 
explored the interaction of individual 
sonality patterns as partial determinants of 
group performance or group choice. Smelser 


area 


per- 


(1958) compared compatible and incompat- 
ible groups formed by means of CPI Domi- 
nance scale and varied role assign- 
ments. He found that the compatible groups 
were more productive on a specially designed 
train (1958) has also 
successfully used a conception of compatibil- 
ity to study the effects of personality interac- 
tions in small groups. The present study fol- 
lows this latter direction of investigation in 
an effort to understand more precisely the in 
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running task. Schutz 
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fluence on group performance of interaction 


between members who differ on the dimension 
of dominance-submission. 

Compatibility is considered to be achieved 
in two distinct ways. Personality compati- 
bility is said to exist when one member of a 
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vehavior that the other mem- 


ber of the group wants, or when one member 
other h 


il 


group expresses | 1 


pulls for” behavior which the abit- 


ually expresses. For example, if a person who 
habitually behavior is 
matched with a person who habitually wants 
him 


i 


expresses dominant 


dominant behavior expressed toward 
then personality compatibility is said to exist 
Role or functional compatibility exists when 
the roles given the individual group members 
red behaviors 
} 


it 


are consonant with their prefer 
itually 
the 


person 


For example, when a person who ha 
behavior 
situation 


given 
a 


dominant is 
dominant role 1 


who habitually expresses submissive behavior 


expresses 
in a in 
is given the submissive role, then the group 
is said to be role compatible. Either type of 
compatibility may exist without the other 
they may exist together 

When | 

habitually expressed behavior patterns con- 
tribute to the effective structuring of the 
situation and thus to the of t 
problem. Conversely, when the situation is so 
structured that the individual 
to express his habitual behavior patterns, 
in the role compatibility situation, problem 
facilitated. It that 
not the variable of dominance 
is considered the crucial determi 


yersonality compatibility exists, then 
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solutior 
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is permitted 
a> 
solving is also is cleat 
compatibility 
submission i 
nant of group interaction, since presumably 
compatibility can be established on individual 
personality variables other than dominance 
The dominance variable should be considered 
as one important determinant of group in 
teraction and thus of group productivity 

The effectiveness of group problem solving 
s thought to function of skills 
on the particular problem selected, the inter 
action of the personality of the 
individual members of the and the 
degree that the situation allows for compatible 
It to surmis¢ 


be a group 


patterns 


group 


interactions is possible only 
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that the two forms of compatibility may have 


ifferent impact on group produc 


potentially « 
ivity. Personality compatibility presumably 


ontributes to ease of interaction and perhaps 
each 
least 


the 


mn between members since 


ommunicati 
a desired or at 
habitual way Role compatibility, on 
other hand, permits expression of habitual 
the function of task assign- 
In these general terms, then, it 
predicted that both forms of compatibility 


should facilitate group problem solving. Since 


S partic ipating 


modes through 


ment is 


the 


groups formed for this experiment are 
highly specified as to goals and procedures, 
role compatib lity should be a more effective 


predictor of productivity than personality 


compatibility 
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scores were put in one pair, so that in each pair 
there would be one dominant and one submissive 
member. A group was called CPI role compatible 
when the dominant member was given the dominant 
role in the later interaction, and CPI role in- 
compatible when the submissive member was given 
the dominant role. Degree of personality com- 
patibility, measured by the difference between the 
Dominance scale two members of a 
pair, was equalized for role compatible and incom- 
patible pairs. It should he noted that all of the 
20 pairs were personality compatible, since in every 
pair the dominance scores of the two individuals 
were divergent by a minimum of 8 points. When 
the dominance scores of the two members of a 
pair are the same, or very nearly the same, then 
both members tend to be about equally dominant 
and the group is said to be personality incompatible 
Ten personality incompatible groups were also 
constructed on the basis of the CPI scores. Half of 
these were compatible and half were role 
incompatible 

In summary, 60 groups were formed. There were 
10 FIRO personality compatible groups and 10 FIRO 
personality incompatible groups, all of which were 
role compatible. In addition, there were 10 FIRO 
role incompatible groups, half of which were 
personality compatible and the other half of which 
were personality incompatible. Ten role compatible 
and 10 role incompatible groups were constructed 
on the basis of the CPI and all of these groups 
were personality compatible. In addition, there were 
10 CPI personality incompatible groups, half of 
which were role compatible while the other half 
were role incompatible. 

Since subjects in all but 10 of the groups had 
been given the ICL during the first session it was 
possible to obtain scores for individuals in 50 of 
the groups on the equivalent ICL measures of 
dominance-submission. Leary’s (1957) ICL gives, 
among other scores on the variables of 
managerial-autocratic and docile-dependent behavior. 
Leary believes that docility “pulls” strong, helpful 
leadership from others, whereas control behavior 
provokes others to obedience, deference, and respect. 
It is possible, then, to construct personality com- 
patible (a high managerial, low docile person with 
a low managerial, high docile person) and person- 
ality incompatible pairs using this test. It is also 
possible to construct role compatible and incom- 
patible pairs. The following formula was used to 
obtain a personality compatibility score for each pair: 


scores of the 


role 


scores, 


Ip = (1D, — Syl) + (ID; — Sil) 

Ip is the measure of personality compatibility; as 
it decreases, personality compatibility increases 
Absolute measures are used here, as in the case of 
personality compatibility with FIRO, since again 
the concern is with the size rather than the direction 
of the differences. In the above formula D, is the 
dominance score of the first member of the pair, 
and D, the dominance score of the second member. 
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S, and S,; are these members’ respective scores on 


the submission variable 

ICL role compatibility was determined in a manner 
analogous to the determination of FIRO role com- 
j 


patibility. The following formula was used 


Ir (D, D t S;) 
role compatibility; as 


increases. Person i 
pair and Person 


measure of 
compatibility 


Ir is the 
creases, role 
dominant member of the 
submissive member 

The 50 pairs on which Ip and Ir scores were avail 
able were ranked separately on the basis of these 
Ten compatible and 10 incompatible 


scores pairs 


chosen on both variables by picking the ex 
p-rsonalit 
while the 


were 
treme high and low The 1 
compatible groups had a total Ip of 21, 
incompatible groups had an Ip of 100. The 10 role 
compatible groups had an Ir of 76, while the 
incompatible groups had an Ir of —71 
When returned for the second 
they other and given the 


following directions 


scores 


subjects session 


were introduced to each 


I am interested in how well people can work 
together. On this table you see There 
are five rings, differing in size, on one of them 
The object of this task is to transfer the rings 
from the peg 
[designated another] 


three pegs 


[designated one peg] to this one 
There are two restrictions in 
moving the rings: (1) Only one ring may be off 
the pegs at any one time. (2) You can never put 
a larger ring on top of a smaller one. You should 
attempt to complete this task with a minimum 
number of moves, and as quickly as you can. 
You may talk freely with each other about the 
moves you wish to make. However, you, Mr 
[Miss] A, will have the final say as to which move 
will be made. You must actually formulate each 
move verbally, for example, “Move the green ring 
to the middle peg,” before Mr. [Miss] B can 
move the ring. Only Mr. [Miss] B can 
move the rings. Remember, you should complete 
this task with a minimum number of moves, and 
as quickly as you can. Are there any questions? 


actually 


All questions were answered with restatement of 
the original instructions. 

The instructions were 
necessity for cooperation and to 
sibility of independent action of the group members 


communica- 


maximize the 
pos- 


designed to 
minimize the 


so as to require group interaction and 
tion on the task. One member was forced to assume 
a dominant role (that is, the final say’’) 
while the other person was forced to follow 
tions. The assumption is that a submissive 
in the dominant position will be 
hesitant, thus lowering his productive efforts, and 
that other role and personality assignments would 
provide appropriate difficulties. The situation, how- 
ever, permits all combinations of role and person- 
incompatibility to occur, and 


“has 
direc- 
person 


uncomfortable and 


ality compatibility or 
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TABLE 1 


MEAN TIME AND Moves COMPARISONS BETWEEN COMPATIBLE AND 
INCOMPATIBLE GROUPS ON THE THREE TESTS 


Moves 


Personality 


ICL FIRO CPI ICL FIRO 
Incompatible 38 : 230 | 249) 1603 | 1499 1649 | 1620 
Compatible 181 | 193) 1341 | 1528) 1111) 1561 
Difference 2 49 56 262 39 538 59 
t 7 33 2 - 1.85 39 93 10; 2.12 24 


2 
p <.05 <.05 | <.05 <.05 


* In seconds 


since deviation from assigned roles is not permitted, 
a relatively complete observation of the operation 
of compatibility in a cooperative context is assured 

The subjects were asked to complete the three 
peg, five ring problem twice. They were then asked 
to perform a variant of this task, utilizing four 
pegs and eight rings. Their second successful trial at 
this task marked the end of the experiment. 


RESULTS 


There were no differences between com- 
patible and incompatible groups on their 
average Thorndike and Gallup (1944) Vo- 
cabulary scale scores. This was true for all 
the comparisons made and appeared to in- 
dicate that it was not differences in intel- 
lectual functioning which contributed to the 
differences in performances in the compatible 
and incompatible groups. 

A total time and total moves score was 
calculated for each group by adding their 
time and moves scores on the four trials. 
These scores were used in all comparisons. 
The comparisons between compatible and 
incompatible groups, for both the time and 
moves criteria, are shown in Table 1. The 
moves criterion provided significant dis- 
criminations between compatible and incom- 
patible groups, while the time criterion, in 
general, did not. The moves criterion showed 
significant differences between FIRO-B per- 
sonality compatible and incompatible groups, 
CPI role compatible and incompatible groups, 
and ICL role compatible and incompatible 
groups, while the time criterion showed a 
significant difference only between ICL per- 
sonality compatible and incompatible groups. 
It should be noted that 10 of the 12 com- 


parisons showed differences in the predicted 
direction. 

In order to see if the efficiency of prediction 
of the productivity in the groups might in- 
crease if the role and personality compati- 
bility variables were combined, groups that 
were both personality and role compatible 
were identified on both the FIRO and ICL 
and were compared with groups that were 
both personality and role incompatible. 

Table 2 shows that the FIRO compatible 
groups finished the task in significantly fewer 
moves than the incompatible groups. They 
also finished the task in less time; however, 
this difference was not significant. The ICL 
compatible groups did better than the incom- 
patible groups on both moves and time. 
Neither of these differences was significant, 
however, the moves criterion falling just 
short. 

The role and personality compatibility 
variables were combined across tests to see 
whether this might improve the prediction of 


rABLE 2 


COMPARISON OF MEANS OF COMBINED PERSONALITY 
AND ROLE COMPATIBLE AND INCOMPATIBLE GROUPS 
on FIRO anv ICL 


Moves Time* 


FIRO ICL FIRO ICF 


251.2 

188.6 

62.6 
2.59 


2364 1775.6 1529.6 
195.8 1631.8 | 1415.0 
40.6 143.8 114.6 

1.50 51 43 


<.10 


Incompatible 
Compatible 
Difference 

M 

p <.05 


® In seconds. 
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productivity. Accordingly, Kpr, Ipr, and Cr 
were separately ranked across all groups, and 
the ranks were combined for the different 
tests. This resulted in three new variables 
of compatibility which were designated by 
the symbols KIpr, KCpr, and ICPr. KIpr 
for example, combines both role and person- 
ality compatibility measures on both FIRO 
and ICL, and the interpretation of the other 
symbols is analogous. Ten compatible and 
10 incompatible groups were selected by using 
extreme high and low scores on each of these 
variables. 

All six of the comparisons provided dif- 
ferences in the predicted direction, although 
the only significant difference occurred when 
compatible and incompatible groups on the 
ICpr variable were compared on moves. 

All three compatibility KIpr, 
KCpr, and ICpr, may be combined and high 
and low compatible groups may be selected 
on the basis of the resulting variable, KCIpr 
This comparison summarizes the results 
rather well, in that the moves criterion 
measure showed a significant difference be- 
tween compatible and incompatible groups, 
whereas the time criterion measure did not. 
It appears to be generally true that the com- 
patibility variables used in this study predict 
the criterion of moves better than the criterion 
of time. 

It is clear that the various measures of 
compatibility used here are not independent, 
and these measures were correlated to deter- 
mine their degree of dependence. Personality 
and role compatibility scores for each test 
were correlated—that is, Ip was correlated 
with Ir, Cp with Cr, and Kp with Kr—and 
these correlations were .01, .29, and .14, 
respectively, Next, the intercorrelations be- 
tween Kpr, Cpr, and Ipr were computed, and 
these correlations averaged around .32, all of 
them being significant at the .05 level. 

It will be recalled that a compatible per- 
sonality group is one in which the dominant 
person is the leader, and therefore finally 
determines the next move, and that an in- 
compatible group is one in which the sub- 
missive person is the leader. If intelligence 
were positively related to dominance then it 
is possible that the compatible groups might 
have performed better because their leaders 


variables, 


RupoLtFr H. Moos ANp JoserpuH C 


q SPEISMAN 

intelligent than were the leaders of 
the incompatible groups. Accordingly, in 
FIRO-B expressed control 
and ICL man 


™ - ~ ] } 
scores were correlated with 


were more 


dividual scores 
CPI 
agerial-auto¢ 
the Thorndike and Gallup (1944) Vocabulary 
score. This was done separately for the males 


Dominance scale scores. 
ratic 


and females. 
The six 

which 

between .1 


resulting correlations, none of 
Statistically significant, 
and 17. Gough (1956) reports 
between 


were ranged 


dominance and tota 
scores on five different group intelligence tests 


correlations 


ranging between .13 and .45 


for males only 
and averaging .29 

Schutz correlated the scores of 248 males 
on the FIRO-B expressed control scale with 
their scores on a vocabulary scale, a general 
information scale, and an arithmetic reason- 
ing His cort ranged from .17 


] | +; 
scaie. reiauions 
i Z¥ 


These results indicate that there are 


low positive relationships between dominanc« 


and intelligence, that less than 2( 
variance is accounted for by the hig 
these and that on 


intelligence scores account for | 


correlations, the aver: 

less than | 

of the variance in dominance scores. It should 
also be noted that the correlations found in 
the sample distributed themslves 
around zero. The amount of 
tributable to intelligence, then, does not ap 
pear to be sufficient to account for the ob- 
served differences between the compatible and 


present 


variance 


incompatible groups. 

Since the task 
motor skills, it was hypothesized that males 
would do better than females. All male groups 
were compared with all female groups, in an 
equal number of each being compatible and 
incor’patible. Table 4 indicates a difference 


itself involved primarily 


TABLE 3 


COMPARISON OF MEANS OF MAL! 
ON THE Two CRITERI 


AND FEMALE GROl 
N MEASURES 


I emale 
Male 


Differer 


h 





Group COMPATIBILITY AND PRODUCTIVITY 


significant at the .01 level on the time 
criterion and no significant difference on 
moves. It should be that there 
were, in all the comparisons mentioned above, 
an equal number of male and female groups 
in both the compatible and the incompatible 
pairs 


noted here 


DiscUSSION AND CONCLUSIONS 


that 
small 


contention 


The 


compatibility 


results support the 
between members of 
groups is an important contributory factor in 
the productivity of such groups. The findings 
are more impressive for the fact that the 
task set for these groups a relatively 
meaningless one, offering some novelty, but 
having relatively few intrinsic motivational 
properties. Since the interaction of personal 
qualities of the members was demonstrated 
to be an effective aspect of the productivity 
of the under circumstances of low 
intrinsic motive properties of the task, then 
f group 


was 


groups 


it becomes evident that exploration of 
under meaningful 
will be cor siderably aided by the 


activity more conditions 
utilization 
of a concept of compatibility. 

The comparisons between the two criteria 
moves) and between male and 
female performance contribute to the 
understanding of the intra- and interpersonal 


group productivity. It 


(time and 


also 
process involved in 
appears that all groups, with one exception, 
took approximately the same time to com- 
plete the tasks, but the compatible groups 
made fewer false starts or wrong moves. The 
ICL personality compatible groups were the 
only ones to complete the task in a sig- 
nificantly shorter time than their matched 
may be 


incompatible group, and while this 
a chance finding, interesting leads for further 


research may be discerned on the basis of 
the particular personality variables assessed 
(that is, autocratic leadership versus docile 
dependence). That is, there may well be an 
interaction between personality variables and 
the method of measurement 

The finding that females 
as to number of moves required but 


males and were 
equal 
that females required significantly more time 
was congruent with observations made during 
the experimental process. The experimenter 
reported that the members of female groups 
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who were required to be submissive were 
more likely to delay carrying out the in- 
structions of their active partners with such 
comments as, “I just don’t quite understand— 
say it once more,” “We’re working together 
and I want to understand, please repeat that,” 
etc. The difference was that passive members 
of compatible groups employed this obstruc- 
tionism but carried out the instructions cor- 
rectly, while members of incompatible groups 
delayed implementing the directions and then 
“misunderstood” and made incorrect moves 
Another observation indicated that the sub- 
jects’ desires to establish a good relationship 
would markedly slow down their problem 
solving performance. Often it seemed ap- 
parent that one member had an idea which, 
if carried out, would easily have solved the 
problem, but which he would not or could 
not express definitively, seemingly because 
of fear of his partner’s displeasure at a too 
obvious display of assertiveness and control 

These impressions reported by the experi- 
menter add to the evidence that the dynamics 
of interaction among variables such as role, 
personality trait, and situation cannot reason- 
ably be interpreted except by means of a vari- 
able of compatibility. Furthermore, it be- 
comes clear that other variables of inter- 
personal relationship may be profitably ex- 
plored within the group process setting by 
means of a dimension of compatibility 

SUMMARY 

The purpose of the present study was to 
attempt to predict the productivity of spe- 
cially constructed incom- 
patible, two-person groups. Different methods 
of constructing the groups were used, but in 


compatible and 


all cases dominance-submission was used as 
the relevant interpersonal variable. Two dif- 
ferent conceptions of compatibility—role and 
personality—were defined and utilized. 

One hundred twenty undergraduate psy- 
chology students were initially tested with 
the FIRO-B, the CPI Dominance scale, the 
ICL, and a vocabulary scale. Using each of 
the three personality tests, role and person- 
ality, compatible and incompatible, same- 
sexed, two-person groups were derived. These 
groups were given a simple laboratory task 
to solve. 
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The compatible groups outperformed the 
incompatible groups, on the criterion meas- 
ure of the number of moves, whereas the 


measure of time to complete the task did not 
provide a reliable predictive function. 
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Festinger’s (1957) theory of cognitive dis- 
sonance has led to a growing body of research 
on the consequences of carrying out behavior 
that is contrary to beliefs and values (forced 
compliance). Experiments on forced com- 
pliance (e.g., Cohen, Brehm, & Fleming, 
1958; Festinger & Carlsmith, 1959) have 
demonstrated that the fewer the cognitions 
supporting compliance in discrepant behavior, 
the greater the cognitive dissonance. Four 
recent studies (Brehm & Cohen, 1959; Brock, 
1960; Cohen, Terry, & Jones, 1959; Davis & 
Jones, 1960) have shown that more disso- 
nance is obtained where individuals experience 
a high rather than a low degree of subjective 
choice in engaging in behavior that is con- 
trary to their prior values. 

The present study extends 
theory to conflicts created by inducing ag- 
gressive responses in individuals who are op- 
posed to such aggression. It examines the 
effects of administering punishment upon the 
administrator when he has the choice of with- 
drawing from the experiment so that he need 
not inflict the pain. After such a choice, the 
greater the intensity of the aggression, the 
greater should be the dissonance and con- 
sequent tendency to reduce dissonance. The 
aggressor may reduce dissonance by judging 
the stimulus to be less noxious, saying in 
effect, “The pain I administered was really 
rather mild.” Thus, the first hypothesis was: 
choosing to carry out aggression of which 
one disapproves leads to minimization of its 
painfulness. This hypothesis assumes that a 
major avenue of dissonance reduction may be 
the aggressor’s partial denial that he caused 
pain, i.e., that his aggression was really mild 
and innocuous. If pain minimization is an 
avenue of dissonance reduction, it follows that 
more cognitive dissonance should lead to more 


dissonance 


1The comments of A. R. Cohen and Merle J 
Moskowitz are gratefully acknowledged. Thanks are 
due to Jean McCulla and Lee Glickman for their 
services as “victims.” The study was conducted while 
the first writer was at the University of Pittsburgh 
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pain minimization. When dissonance is present 
it may be assumed that more intense aggres- 
sion leads to more dissonance, and therefore 
the second hypothesis is: when dissonance is 
present (the choice condition), the more in- 
tense the aggression the greater the minimiza- 
tion of its painfulness. 

Postaggression dissonance might be reduced 
in other ways. The aggressor might judge that 
he actually had little choice in aggressing; 
feeling that he was under obligation to 
deliver noxious stimuli would be consonant 
with the knowledge that he had performed 
such behavior. A change in opinion might also 
reduce dissonance, i.e., adoption of a favor- 
able attitude toward the aggressive behavior. 
These and other alternatives to pain mini- 
mization were assessed. 


METHOD 
Subjects 


The subjects were 41 male and 42 female students 
recruited from introductory psychology classes. They 
were 34% of a larger sample that had been ad- 
ministered a questionnaire; the subjects used reported 
being “opposed” or “very opposed” to the use of 
electric shock on humans in scientific research 


Apparatus and Procedure 


The apparatus was an “aggression machine,” a 
device fully described by Buss (1961). Briefly, the 
subject is told to play the role of an experimenter 
and to administer electric shock to another “subject” 
whenever the latter makes an incorrect response. 
The second subject is really an experimental ac- 
complice who is hereinafter referred to as the 
victim. 

The subject is told to train the victim in a 
conceptual task and he is shown how to present 
stimuli and how to shock the victim. A barrier 
between the subject and the victim prevents visual 
communication, and talking between the two is 
ruled out by the instructions. The victim presents a 
programed series of responses such that the number 
of incorrect responses (and therefore shocks) is 
constant for every subject; in 70 “learning” trials, 
34 shocks are given. The victim surreptitiously turns 
off the shock and records how much is administered 
There are 10 shock buttons, and before the experi- 
ment starts, the subject is shown the value of 4 
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* 


buttons (1, 2, 3, and 5), ostensibly to 
acquaint him with “what will be felt by your sul 
ject,” ic., the victim. The first 
just above the touch threshold; 2 and 3 are painfu 
and 5 is an extremely The sul 
ject is told that the intensity of shock continues to 
climb beyond 5, but the subject given a 
more shock at this time. While experiencing the 
sample shocks, the subject is asked to rate each one 
on a scale running from “can’t feel a thing” to 
“extremely painful.” At the end of the experiment 
the subject was given Shocks 3 and § again, and th 
before-after change in rating the principal 
dependent variable 

In the High Shock condition subjects were ir 
structed to use only the shocks at levels 6-1 
nothing below. In the Low Shock condition subject 
instructed to use the shock values 1 
above. The victim gasped audibly every 
shock given was 6 or higher because, in 
(Buss, 1961), it was found to be 
the victim to remain silent when 
intense shocks ostensibly delivered at 


of these 
button gives shock 


stimulus 


noxious 


is not 


was 


were only 
nothing 
time the 
earlier studies 
realistic for 
receiving the 
levels 6-10 

Subjects in the No Choice condition then proceeded 
with the experiment; the victim brought in 
and engaged in a “concept formation experiment.’ 
In the Choice condition the experimenter gave an 
option to leave 


was 


and have 


about and how 


Although up here today 
learned what this experiment is all 
to do it, I want to emphasize that there is no 
obligation to continue if you don’t want to. You 
can leave if want to—some students have 
preferred not to act as experimenter in this research 
and have left. Do you want to continue? Are 
you sure? 


you came 


you 


Most subjects nodded at this point. Then the 


experimenter said again 


You know that it’s entirely up to you whether 
or not you and give the shocks. Are 
willing to do it? 


stay you 


All but three subjects agreed, and the manipulation 
was concluded after each subject had said “Yes” at 
least once. The loss of three subjects who left the 
experiment showed that the option to continue was 
effectively communicated: this loss was not expected 
to seriously bias the results 

After the victim had been “run” and dismissed, the 
experimenter returned and asked the subject again 
to rate the intensity of Shocks 3 and 5 “in order to 
check on variations in shock level.” Then the 
subject completed a questionnaire consisting of item 

(a) attitude using shock in 
(the same item select the 
(b) obligation to the subject, 


towards 
used to 


shoc k 


measuring 
research 
jects) ,* 


sub 
(c) 


2Since it was decided to remeasure attitude aft 
the experiment had begun, this item 
were not available for the first 18 of the 83 subjects 


responses on 
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ocks, (d) 


reduction 


other alterna- 
Finally, the 
explana 


avenues 
erimenter gave the ject an exhaustive 

1 of the 
tion and 


experiment 


relevance of his work 
asked 
There 


had prior inf 


to concept forma 
7 +} 


discuss 


the subject not to 


subse quent 


the 


was no evidence that 


rmation concerning 


ubjects 
experiment 


Experimental Design 


Four grout of sul 
Choice and No Choice, and 
High Shock and Low Shock known 
e sex differences in aggression 1961), an 
I and females was assigned to 
roles. Thus, there 
shock level, sex of sub 
This results in a 2 X 2 2 2 
with five 


have described 
withir h of 


since ther are 


jects 


these, 


(Buss, 


ual number of males 


nter 


and victim 
choic » 


the experime 
variables in all 


ject, and sex of victim 


were 


design: 16 cells subjects per cell 


RESULTS 


Effectiveness of thi 


tions 


Experimental Manipula- 


The subjects in the High Shock condition 
invariably gave shock values between 6 and 
10, inclusive; in the Low Shock condition no 
shocks greater than 5 were given. Within the 
narrow range of shock allowed in each shock 
condition, amount of shock 
related to the subject’s initial 
rating of painfulness. 


given was not 


to choice or 


In completing the final questionnaire, the 
subject indicated on eight-point scales (a) his 
perceived degree of “obligation to shock the 
subject” and (4) the extent to which he felt 
whether or not he administered the shocks was 
up to him. The High Choice subjects reported 
less obligation and more choice than the Low 
Choice subjects: ¢ = 1.73, p< .10 and ¢ 
2.67, p < .01, two-tailed.* Although the dif 
ference in obligation did not reach 
statistical significance at the 5% 


scores 
level, it was 
in the expected direction. It was concluded 
that the manipulation of choice was generally 
successful 


Painfulness of the Shock 


The major dependent variable was the 
change in perceived pain of Shocks 3 and 5 
from before to after the administration of 
shock to victims. Since the correlation be- 
tween the change scores for 3 and those for 


3 All statistical tests in this study were two-tailed 
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5 was .76, it was necessary to use only one 
score, and that for 5 was chosen. (Compari 
sons using change scores for 3 yielded results 
identical to those for 5.) 

The change scores for 5 are presented in 
Table 1. The subjects tended to 
minimize pain, perceiving the electric shock as 
less painful after than before; whereas the No 
subjects tended to maximize pain, 
perceiving the electric shock as more painful 
after than The significance of these 
trends was evaluated in an analysis of vari 
ance, which is presented in Table 2. The 
difference between the Choice and No Choice 
groups is highly significant. Furthermore, the 
mean change in perceived pain was signifi- 
cantly different from zero for both Choice 
(the mean of —10.75 yielding a ¢ of 4.94, 
p< .001) and No Choice (the mean of 
+6.20 yielding a ¢ of 3.05, p < .01) 

The only other significant F in Table 
is fer the Shock Victim interaction; on 
further analysis the interaction was found 
to be significant only for Choice subjects. The 
meaning of this interaction may be found in 
the Choice data in Table 1. When the victim 
was a male, high shock led to more minimiza 
tion of pain than did low shock. The results 
for a male victim are in accord with the 
second hypothesis of the study: under choice 
conditions the greater the magnitude of ag- 


Choice 


Choice 


before. 


rABLE 1 


MEAN AmouNT OF CHANGE IN PERCEPTION OF SHOCK 
PAINFULNESS BY EXPERIMENTAL CONDITION 


Choice 


High Shock 56 13.0 


Low Shock 12.6 6.0 


No choice 


High Shock 
Low Shock 


rABLE 2 


ANALYSIS OF VARIANCE OF PAIN 


MINIMIZATION RESULTS 


Choice 
Shock 
Sex of Subject 
Sex of Victim 


Choice X Shock 
Choice * Subject 
Choice X Victim 
Shock X Subject 
Victin 
x Victin 


Shock xX 


Subject 


hoice X Shock X Sex 
Choice X Shock X Victir 
Choice X Subject K Victim 
Shock X Subject X Victim 


Choice X Shock X Subject X 


Victim 


Error 


gression (shock administered), the greater 
the minimization of its painfulness. The re- 
sults for female victims are opposed to this 
hypothesis, but an explanation for this ap- 
parent inconsistency must await further 
analysis. 


i/ternatives to Pain Minimization 


It was conceivable that subjects may have 
reduced postaggression dissonance by judging 
that they were under great obligation to ad- 
minister the shocks or by becoming atti- 
tudinally more favorable toward “using shock 
on humans in research.” In addition to con- 
firming the effectiveness of the choice manip- 
ulation, analyses of variance of the choice 
and obligation scores yielded only the follow- 
ing significant outcomes: a higher order inter- 
action (Choice X Shock x Sex of Subject 

Sex of Victim) for report of choice (F = 
4.8, p<.05); more obligation expressed 
under High than under Low Shock (F 
6.9, p< .025); more obligation when the 
victim was female than when the victim was 
male (F = 4.1, p< .05). The higher order 
interaction was uninterpretable, but the re- 
lation between shock and obligation was con- 
sidered consistent with theory; under High 





Trwotuy C 


TABLE 3 


Mean Amount or AttirupE CHANGE 
BY EXPERIMENTAL CONDITION 


Condition 


Choice, High Shock 
Choice, Low Shock 
No Choice, High Shock 
No Choice, Low Shock 


favorable tov 
research 


Shock (high dissonance) greater force to re- 
duce dissonance by reporting greater obliga- 
tion to administer the noxious stimulation 
The lack of other significant comparisons 
suggested that postaggression perception of 
choice and obligation was not systematically 
affected by the independent variables. Cor- 
relations between choice, obligation, and pain 
minimization, were nonsignificant in all treat- 
ment combinations (all r’s < .25). 

If attitude change had been employed to 
reduce postaggression dissonance, more change 
in favor of using shock in research would be 
expected under Choice than under No Choice 
and, in the Choice condition, under High 
Shock than under Low Shock. The appropri- 
ate data are presented in Table 3. Although 
the differences between the mean amounts of 
positive change were not contrary to these 
hypotheses, high within-condition variance 
prevented all comparisons from reaching sta- 
tistical significance. The correlation between 
positive attitude change and amount of pain 
minimization was significantly different from 
zero in the Choice, High Shock condition 
(r = +.52, df = 14) but this value was not 
statistically larger than the lower nonsignifi- 
cant correlations obtained in the other ex- 
perimental conditions. 

In addition to distortion of volition and 
attitude change, subjects might have reduced 
postaggression dissonance by (a) regarding 
their participation in the experiment as a 
worthwhile and meaningful experience, (5) 
not carrying out the instructions to their own 
satisfaction, (c) regarding the experiment as 
scientifically important. An analysis of items 
referring to these alternatives on the final 
questionnaire showed no differences between 
the experimental conditions. 
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DISCUSSION 


The first hypothesis was that persons choos- 
ing to carry out aggression of which they dis- 
approved would minimize its painfulness. This 
theoretical derivation was clearly supported: 
Choice subjects minimized and No Choice 
subjects maximized the painfulness of the 
aggressive stimuli they delivered. The present 
results were consistent with the effects of 
choice in other studies reviewed by Cohen 
(1960). The significant maximization of pain 
under No Choice may be understood as a 
straightforward resistance effect: subjects 
opposed to giving shock were forced to do so, 
and their perception of shock painfulness 
increased commensurately with their dislike 
for what they were obliged to do. 

It was also hypothesized that, in the Choice 
condition, the greater the magnitude of ag- 
gression, the greater would be the minimiza- 
tion of its painfulness. This hypothesis was 
supported only when the victim was a male: 
High Shock subjects minimized pain more 
than Low Shock subjects. When the victim 
was a female, this effect was reversed. This 
evidence for an opposing process when females 
were victims might be accounted for in terms 
of dissonance theory. The report of obligation 
is pertinent. Expression of obligation to shock 
was significantly greater under High Shock 
than under Low Shock; also, more obligation 
was reported when the victim was a female. 
Thus, when the victim was a female, the pre- 
ferred avenue of dissonance reduction for the 
aggressor was evidently an increase in feeling 
of obligation, the subject saying in effect, 
“T gave pain but I was obliged to do so.” 

It was noted that choice had no effect on 
amount of shock actually given. Such an 
effect might have been expected, especially 
in the High Shock condition: the dissonance 
experienced by Choice subjects could be 
reduced by actually administering lower 
shocks to the victim. However, it is doubtful 
that an adequate test for such an effect was 
provided. The shock range was narrow, and 
High Shock subjects gave as little shock as 
possible, the mean shock value being 6.4 
(the range was 6-10). This tendency to give 
the lowest value permissible was probably 
effective in obliterating any effects of choice. 
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Theoretical Implications 


This experiment 
(1957) ranking of 
resistance to change. He proposed that cogni- 
tions referring to beliefs and opinions are 
more easily changed than cognitions referring 
to behavior or perception of the physical 
environment. An _ individual should have 
greater control over his private thoughts and 
beliefs than he does over his behavior, and 
further, it should be easier for him to 
change his behavior than to modify aspects 
of the physical environment. Festinger as- 
sumed that the avenue of dissonance reduction 
most likely to be used is the one involving 
those cognitions least resistant to change, and 
therefore the most likely avenue is a change 
in attitude, followed by a change in behavior, 
and, lastly, a modification of the environment. 

It follows from Festinger’s (1957) analysis 
that, in the present study, the preferred 
avenue of dissonance reduction should be a 
change in perceived volition and/or attitude 
toward the use of electric shock on humans. 
A change in the perception of the physical 
environment (judgment of the pain of electric 
shock) should be a much less preferred avenue 
of dissonance reduction. The results are clear 
on this point: feelings of choice and change in 
opinion about the use of shock were not 
related to magnitude of dissonance, but judg- 
ment of pain was. Thus, assuming the meas- 
ures of opinion and choice were as sensitive 
and relevant as that of pain evaluation, the 
present results contradict Festinger’s ranking. 

It is, however, difficult to equate measures 
of attitude and of perceived stimulus effects 
in terms of sensitivity of measuring instru- 
ments, their relevance to the issue, etc. It 
was possible moreover that order of presenta- 
tion of the avenues of dissonance reduction 
these results artifactually: op- 
minimization preceded op- 


Festinger’s 
terms of 


bears on 
cognitions in 


produced 
portunity for 
portunity for volitional distortion and at- 
titude change. Assuming the three responses 
were equipotent avenues of postaggression dis- 
sonance reduction, they might be expected to 


be positively associated under conditions 
where dissonance was maximal (because there 
would be so much dissonance to reduce); 


where dissonance was low, one or the other 
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might suffice to reduce whatever dissonance 
was present. Thus, positive correlations be- 
tween minimization, volitional distortion, and 
attitude change would be expected under 
high dissonance and inverse relations under 
low dissonance. Actually, however, there were 
no differences between experimental sub- 
groups in the extent to which these variables 
were correlated. Thus, a problem for fur- 
ther research concerns the possibility that 
Festinger’s (1957) hierarchy of cognitions in 
terms of resistance to change is a ranking 
that obtains only for certain types of dis- 
sonance or only within a certain range of 
magnitude of dissonance. Another possibility 
is that subjective judgments of physical 
stimuli are not necessarily highly resistant to 
change and hence may serve in the reduction 
of dissonance. 


SUMMARY 


From cognitive dissonance theory, hypoth- 
eses were derived concerning the aggressor’s 
revaluation of the pain he delivered when his 
aggression was contrary to his beliefs: (a) 
the greater the choice in carrying out such 
aggression, the greater the minimization of 
the painfulness of the aggression; (6) at 
least under the choice condition, the greater 
the magnitude of the aggression, the greater 
the minimization of its painfulness. Eighty 
University of Pittsburgh introductory psy- 
chology students, who were opposed to using 
electric shock in research, gave high and low 
intensities of shock to another person under 
voluntary and nonvoluntary conditions. The 
male and female subjects who were ostensibly 
acting as experimenters in a concept forma- 
tion study, were allowed uninhibited delivery 
of painful shocks to male and female victims 
(the experimenter’s confederates). Amount of 
choice, magnitude of shock, sex of subject, 
and sex of victim, were the independent vari- 
ables in a 2* factorial design. The main 
dependent variable was the aggressor’s change 
in rating of the painfulness of the shocks from 
before to after their administration. 

The results strongly supported Hypothesis a, 
but 5 was confirmed only when the victim 
was a male. With female victims there was 
evidence that expression of great obligation to 
shock was used to reduce dissonance under 
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ERRATA 


On page 148 of the article “Seating Arrangement and Leadership Emergence,” by 
Lloyd T. Howells and Selwyn W. Becker (J. abnorm. soc. Psychol., 1962, 64, 
148-150), the last sentence of Paragraph 2, Column 1, should read: 


However, Steinzor did not investigate the present program, namely, the relationship between 
seating distance and leadership emergence. 


The original sentence implied that Steinzor had reached conclusions about seating 
arrangement and leadership, and this implication is incorrect. 


On page 227 of the article “Significant Factors in Hypnotic Behavior,” by 
Theodore Xenophon Barber and Louis Benjamin Glass (J. abnorm. soc. Psychol., 
1962, 64, 222-228), the third line of the Summary was omitted. The Summary should 
begin as follows (the omitted line appearing in italics) : 


The primary purpose of this investigation was to test two hypotheses suggested by Wells’ 
(1924) informal studies on “waking hypnosis.” 
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ONE-SIDED VERSUS TWO 


SIDED COMMUNICATIONS 


AND COUNTERCOMMUNICATIONS 


CHESTER 


University 


In recent research on persuasive communica 


lat pre sents arguments 


without any mention 


opposing point of view or 


of the 


nents for the 
refutaton : 
a communication that presents argu- 


I argur 
attempted has been called 


“one-sided” 
yf view, and then goes 
the 


are 


ments for a given point « 


argum opposing 


for 
which 


on to enumerate 


point of view at least some refuted 


has been called “two-sided 
The first 
one-sided and tw 


investigation of the relative effects 
o-sided communications was 
War II by Hovland, Lums 
daine, and Sheffield (1949). These investigators 
found no overall difference in the 
of the two types of communication in producing 
attitude change. Lumsdaine and Janis (1953) rep 


licated these results, and also found that subjects 


ol 


made during World 


effectiveness 


who were initially presented with a two-sided 
communication were more resistant to subsequent 
countercommunication than were subjects who 
were initially presented with a one-sided com 
munication. The countercommunications in each 
case were one-sided 

The two present experiments compare the rela- 
tive effectiveness of two-sided 
initial communications when the 
nications are themselves either one-sided or two- 


and 
countercommu- 


one-sided 


They examine the four possible combina 
one-sided 


sided 
tions: one-sided followed by 
followed by two-sided, two-sided followed by one- 
sided, and two-sided followed by 

With regard to the one-sided—one-sided con 
tion neither primacy nor recency effects were pre- 


one-sided 


wo-sided 


; 
1 


dicted; previous research (Miller & Campbell 
1959 
or the second of two opposing communications 
A recency efiect was pre 


gives no reason to expect either the first 
to be more effective 

dicted for the one-sided—two-sided condition, o1 
the common sense grounds of the presumably 
more impressive impact of the two-sided com- 
munication. In the two-sided—one-sided condition 


primacy was predicted on the basis of the study 


incere apprec ia 
under 


1 The author wishes to express his 
tion for the assistance of M. Brewst 
whose supervision this research was conducted 


r Smith, 


4. INSKO! 


»f California 


Finally, in the 


neither 


by Lumsdaine and Janis (1953) 
o-sided—two-sided condition 
recency was predicted. 


primacy 


EXPERIMENT |] 
Ve thod 
Miller and 
provided 
advan 


On the precedent ol 


summarized 


Communication 
Campbell (1959), a law 
the experimental communications, having the 
ige that prior unfamiliarity with the issue could be 
a one-sided prosecu 


cas 


issumed. Four communications, 
tion communication, a two-sided 
munication, a one-sided defense communication, and 
a two-sided defense communication, con 
structed. The communications were all approximately 
The two one-sided communica 
of a 
I The 
these witnesses pointed simply to either the guilt 
or the innocence of the defendant. None of this 
testimony attempted to discredit or contradict the 
testimony of the opposing side 

The two-sided communications were divided 
three parts. The part which took up the first half 
of the prosecution and defense communications, re 
spectively, consisted of a condensed summary of the 
material in the sided prosecution or 
communications. The second part, which 
the third quarter of the communications, 
of a summary of about half of the opposing one 
sided arguments. The third part, which took up the 
final quarter of the communications, consisted of an 
attempted refutation of some of the testimony of 
the preceding witnesses, e.g., testimony that indicated 
that some of the previous witnesses had been bribed 
or that flatly contradicted some of the testimony of 
preceding witnesses.2 Thus the two-sided com 
different type of 
the 


prosecution com 


were 
850 words in length 


tions summarized the testimony 
witnesses 


series of either 


rosecution or defense testimony ol 


into 


defense 
took up 
consisted 


one 


the 
munications contained a somewhat 
information from that contained in one-sided 
communications 

The communications, which were 


presented to the subjects as the summary arguments 


recorded, were 


2The four communications used in the present 
study have been deposited with the American Docu 
mentation Institute. Order Document No. 7175 from 
ADI Auxiliary Publications Project, Photoduplica 
tion Service, Library of Washington 25 
D. C., remitting in advance $1.25 for microfilm or 
$1.25 for Make checks payable to 
Chief, Photoduplication Service, Library of Congress 


Congress; 


photocopies 


03 
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of the prosecution and defense in a supposedly real 
bigamy trial. The tape recordings were not pre- 
sented as having been actually recorded at the trial, 
but different speakers were used to present the prose- 
cution and defense communications 

Procedure. The subjects, members of the introduc- 
tory psychology course at the University of Cali- 
fornia, were presented with the arguments of the 
prosecution and defense, and told that after hearing 
the arguments they would deliberate them- 
selves until they reached a unanimous decision as to 
the guilt or innocence of the defendant. The subjects 
were told that the experimenter was interested in 
studying jury deliberation. Microphones were set up 
in the room supposedly for the purpose of recording 
the deliberation. 

Immediately after hearing the prosecution and de 
fense communications the subjects marked a nine 
point rating scale of opinion as to the guilt or inno- 
cence of the defendant. High values on this scale in- 
dicated agreement with the prosecution and low 
values agreement with the defense 

This single opinion measure was the only one 
taken, according to an after-only design in which 
each of the four basic conditions was administered 
in both a prosecution-defense and a defense-prose- 
cution order. In all there were eight experimental 
groups. Each group consisted of 20 subjects 
(N = 180). 


among 


Results and Discussion 


The means and standard deviation estimates 
for each of the eight experimental groups are 
given in Table 1. The Fy, test for homogeneity 
of variance yielded a value of 1.54 which is not 
significant at the .05 level 

The four ¢ tests that directly test each of the 
four predictions are summarized in Table 1. The 
mean square for the within sum of squares of all 
eight groups was used in the denominator of the 


TABLE 1 


EXPERIMENTAL DESIGN AND ?¢ Test RESULTs 


Group number 1 2 


1-sided 


Prosecution first 1-sided 
f 2-sided 


Defense second 1-sided 
x 4.43 | 3.73 
s 1.88 1.94 
N 20 20 


Group number 5 6 
Defense first 
Prosecution second 
xX 
5 


v 


1-sided 


2-sided 


1-sided 


Group numbers 
Difference between YS 
Direction of difference recency 
‘ 1.31 
af 152 


? 


AND NOTES 


condi- 
nor re 


t. As predicted, the one-sided—one-sided 
tion yielded neither significant 
cency. The difference 
direction by a sizable amount, .74 

Also as predicted, the one-sided—two-sided con 


p < He “ 


a 
two-sided—o 


primacy 


however, is in the recency 


dition yielded significant recency . 
ever, contrary to prediction, the .- 
sided condition did not 
The mean differen the 
direction, but is far from | signficant 

nally, as predicted the two-sided—two-sided c 


dition yielded neither significant primacy nor re 


vield significant primacy 


predic 


35 is In 


cency 

The main problem 
to explain why the third prediction was not sup- 
ported Why is it, if a one-sided-two-sided se- 
quence yields recency, that a two-sided 
sequence does not yield primacy? When opposed 
munication why should a two 


presented by the results is 


¢ 


one-sided 


to a one-sided com 
sided communication be 
sented second, but not when presented first? In 
the two-sided—-two-sided condition the position of 
the communication 
neither primacy nor recency was obtained 

The failure of 
tion to result in primacy disagrees with the results 
of the previously mentioned experiment by Lums- 
daine and Janis (1953). The Lumsdaine and 
Janis experiment differs from the present one 
however, in the possibly important respect that 
the communications they employed concerned an 
issue (Russian capability of producing atomic 
bombs) about which the subjects had some in- 


nore potent when pre 


was of no importance, i.e 


the two-sided—one-sided condi- 


formation and opinions before the experiment 
began. Hovland, Mandell, Campbell, Brock 
Luchins, Cohen, McGuire, Janis 
and Anderson (1957) have discussed the impor- 
tance of previous knowledge on the differential 
effects of a sequence of arguments. Subjects en- 
tered the present experiment completely uni 

formed about the topic of communications, and 
thus had no salient opposing opinions when they 
were presented with the first communication. Un- 


Feierabend 


mn) 


der these circumstances, each one-sided communi- 
cation, when presented first and before the pres- 
entation of a countercommunication, is wholly 
convincing. It is only after hearing a second 
equally convincing countercommunication that 
the first communication can be doubted. On this 
rationale, the advantage of a two-sided 
munication over a one-sided one is restricted to 
the second position, in which prior familiariza- 
tion with the moot issues has given the subject 
occasion to question the facts as presented. Were 
the subjects acquainted with the issues of the 
case at the outset, the two-sided—one-sided con- 
dition might be expected to yield primacy as 


com- 





CRITIQUE AND NOTES 


originally predicted. The second experiment was 
done in an attempt to check on this speculation 


EXPERIMENT I] 
Uethod 


The procedure of the second experiment differed 
from the procedure of the first experiment in only 
one respect. Before hearing the tape recorded com 
munications the subjects were presented with a one 
page summary of half of the one-sided prosecution 
and defense arguments. The arguments were alter 
nated down the page so that two arguments for the 
same point of view, prosecution or defense, never 
occurred next to each other. The page of arguments 
was read silently by the subjects, and then was read 
aloud and commented upon by the experimenter 
The subjects thus were made aware, before hearing 
the initial communication, that there were definitely 
two sides to the issue 


Results and Discussion 


The results of the four ¢ tests that test each of 
the four original predictions are summarized in 
Table 2. The first prediction, that the one-sided 
one-sided condition would yield neither primacy 
nor recency, was again supported, though an in- 
significant primacy effect appeared. The second 
prediction, that the one-sided-two-sided condi- 
tion yield recency, was also again 
ported (p < .01). In contrast with the 
periment, the third prediction, that the two-sided 
condition would yield primacy, was 
01). And finally, the fourth pre- 
two-sided—two-sided condition 


sup- 


would 


first ex- 


one-sided 
supported (>) - 
diction, that the 
would yield neither primacy nor recency, was 
again supported 

A summary of an analysis of variance of the 
data for both experiments is presented in Table 3 
The Fy,x test for homogeneity of variance 
yielded a value of 2.57, which is not signficant at 


rABLE 2 


t Test ReEsv! 


(,roup nu 


Difference between Xs 
Directior { difference 
t 
df 


TABLE 3 


SUMMARY OF THE ANALYSIS OF VARIANCE 


t XOrder X Inf 
nd XOrder XI 


ret X Or 


the .05 level. The total between-groups F is sig- 
nificant (p < .01), warranting the evaluation of 
particular differences by ¢ test, as previously re- 
ported. The two remaining significant Fs are the 
interactions between second communication types 
(one-sided or two-sided) and order (prosecution- 
defense or defense-prosecution) (p < .01). These 
interactions have their main source in the same 
differences that yielded the significant ¢’s. The 
rationale for conducting the second experiment 
centered on the prediction that the interaction 
between first communication types, order, and 
information would be significant. This compari- 
son has an F of 3.31 (p < .07). Since the direc- 
tion of the difference was previously predicted 
however, it is legitimate to test for significance 
by using a one-tailed ¢ test. The ¢ was calculated 
simply by taking \/F with 304 degrees of free- 
dom. The resultant ¢ of 1.82 is significant (p 
< .05). Thus we have definite evidence that only 
with prior information is the two-sided communi- 
cation in first position more effective than the 
one-sided communication in first position. How- 
ever, a direct comparison of the primacy effect 
of Condition 2, 0.35, with the primacy effect of 
Condition s2, 1.40, yields a nonsignificant ¢t (p 
< .08, one-tailed). Thus although the two-sided- 
one-sided sequence without prior information did 
not result in significant primacy, and the two- 
sided—one-sided sequence with prior information 
did result in significant primacy, the direct dif- 
ference between these two primacy effects is not 
significant. Although these results are not as con- 
clusive as one could hope, they strongly suggest 
that prior information has an important effect 
upon the relative impact of subsequent communi- 
cations on attitude change 
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SUMMARY 


In the first experiment subjects were presented 
with the tape recorded summary arguments for 
the prosecution and defense in a fictitious trial 
Both prosecution and defense arguments were 
presented in one-sided and two-sided versions, in 
counterbalanced order, yielding eight conditions 
A one-sided communication followed by 
sided countercommunication resulted, 
dicted, in neither primacy nor recency; 
sided followed by a two-sided resulted, 
dicted, in followed 
prediction in 


} 


recency; a two-sided t 
one-sided resulted, counter to 
neither primacy nor recency; and a 
followed by a two-sided resulted, as predicted, in 
neither primacy nor recency. 

The second experiment, designed to test an i 
terpretation of the unexpected third finding, dif 
fered from the first only in that the subjects 
before hearing the first communications 
partial account of the facts of the trial 


experiment all four of the 


two-side 


read a 
In this 


predictions were 
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THINKING OF 


ACUTE AND CHRONIC SCHIZOPHRENICS ' 


PETER M 


LEWINSOHN 


ANN RIGGS 


ANI 


LaRue D. Carter Memorial Hospital 


In a recent study, Lewis, Griffith, Riedel, and 
Simmons (1959) reported data to support the hy- 
pothesis that schizophrenics have diffi- 
culty in interpreting “oral” proverbs than they 
do “anal,” “phallic,” or “neutral” ones. Using 
four proverbs for each one of these four content 
areas, they compared the performance of schizo- 
phrenics with that of a normal control group 
Their findings can be summarized as follows: 

1. The overall level of performance of the 
that of the control 


more 


schizophrenics was below 
group. 

2. The schizophrenics demonstrated more im- 
pairment on the oral proverbs than they did on 
the anal and phallic ones. No such differences 
were found in the control group 

The meaning of these results, however, is 
rendered unclear because of the following con 
siderations: The groups were not matched for 
intelligence; since no nonschizophrenic psychiatric 
control group was used, it is logically 
sible that the findings are characteristic of 

1 Parts of this paper were presented at the 1961 
meetings of the Midwestern Psychological Associa- 
tion in Chicago, Illinois. 


pos 


other psychiatric groups as well as of schizo- 
phrenics; and the schizophrenics were heterogene- 
ous with respect to length of illness, the 
for the group being 4.4 years (+ 5.05) 
range of 1 month to 
whether the results apply to acute or 


patients, or both 


mean 
with a 


} 


18 years. It is thus not clear 


chronic 
Since the present authors felt that the fi 
yf a selective impairment of performance 
proverbs having an oral content is of considerable 
importance for the the nature 
)f intellectual impairment in schizophrenia, an 
attempt was made to replicate the study of Lewis 
et al. (1959) using an additional nonschizophreni 
control and dividing the schizophrenics 
nto acute and chronic groups 


nding 


r 
on 


understanding of 


group 


METHOD 


groups of subjects were used. Group 1 
recently admitted 
an intensive treatment hospital. To 
patient had to two criteria 
been diagnosed schizophrenic by the 
team, and to manifest at one 


phrenic symptom on Lorr’s Multidimensional Scale 


Four 
onsisted of 
patients from 
be included, a 
To have 
chiatric 


acute schizophrenic 
satisfy 
psy - 


least schizo 





CRITIQUI 


for Rating Psychiatric Patients (MSRPP) as rated 
independently by a psychologist on the basis of an 
hour-long interview.2 Group 2 was a mixed group of 
patients from the same hospital who met the fol 
lowing criteria: They had received a diagnosis other 
than schizophrenia, and they were not rated as 
possessing schizophrenic symptoms on the MSRPP 
Group 3 consisted of chronic schizophrenics from a 
hospital’ who had been ill for at least 6 
years, had been continuously hospitalized for at 
least 3 years, and whose history included evidence of 
disturbed thinking and bizarre behavior. No patient 
who suspected of being brain damaged was 
included in this study. Group 4 was selected from 
a large and heterogeneous group of students and 
general medical patients. Subjects in the four groups 
were matched on the vocabulary subtest of the 
Shipley-Hartford Retreat Scale. The groups were 
roughly equated with respect to age and sex distribu- 
tion. The results of the matching are shown in 
Table 1 

All subjects were given the multiple-choice 
proverbs test used by Lewis et al This test 
consists of 16 proverbs, 4 for each of the content 
Four alternate answers are given for each of 
the proverbs. An abstract answer (Score, 2), a 
partially abstract answer (Score, 1), a 
answer (Score, 0), and an irrelevant one (Score, 0) 
Total each of the content areas were 
computed for each subject. The reliabilities of the 
oral, anal, phallic, and neutral proverbs, as estimated 
by the Spearman-Brown formula, are .79, .73, .7 
and .72, respectively 


state 


was 


Same 
(1959) 


areas 


concrete 


scores for 


RESULTS 


The mean scores of the four groups for the 
four types of proverbs are shown in Table 2 
These results were subjected to an analysis of 
variance. Differences between the content areas 
were significant for the acute schizophrenic group 
(p < .01) but not for any of the other groups 
Comparing the individual means in Group 1 by 

2In order for a subject to be included in the 
acute schizophrenic group, he had to obtain a score 
of 17 or more on 12 items from the Perceptual Dis- 
tortion, Conceptual Disorganization, and Paranoid 
Projection scales of the MSRPP. 

3The authors wish to express their 
to Werner Kuhn and Jack Small for 
help in this study. 
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their kind 


TABLE 


MEAN AGE AND VOCABULARY SCORES OF THE GROUPS 


Grout 


Acute schizophrenics 

Chronic schizophrenics 

Nonschizophre nic 
psychiatric 


Normal control 
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TABLE 2 


MEAN Proverss Test SCORES FOR THE SCHIZOPHRENK 
AND CONTROL GROUPS 


Acute schizophrenics 
Male 
Female 
Chronic schizo 
phrenics 
Male 
Female 
Nonschizophrenic 
psychiatric 
Male 
Female 
Normal control 
Male 
Female 


means of Tukey’s gap test showed the scores ob- 
tained by the acute schizophrenics on the oral 
and phallic proverbs to be significantly smaller 
than their score on the neutral (p < .05) 


Groups 1, 2, and 4 do not differ in their 
4 


prov- 
erbs 
performance on the neutral proverbs 

The performance of the chronic schizophrenics 
is significantly below that of the other groups 
on all proverbs (p < .001) 


DISCUSSION AND CONCLUSIONS 


The results provide partial confirmation of the 
findings of Lewis et al. (1959) in that content 
exerted a significant influence upon the perform- 
schizophrenics. This group 


ance of the acute 


showed impairment on the oral as well as on the 
phallic proverbs but not on the neutral and anal 


ones 

The chronic schizophrenics differed from the 
acutes in that content was not important in their 
performance. Instead, they showed generalized 
impairment which extended to all the proverbs 


*The complete tables on the analysis of variance 
have been deposited with the American Documenta 
tion Institute. Order Document No. 7194 from ADI 
Auxiliary Publications Project, Photoduplication 
Service, Library of Congress; Washington 25, D. C 
remitting in advance $1.25 for microfilm or $1.25 
for photocopies Make checks Chief 
Photoduplication Service, Library of Congress 
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The work of McClelland (1953) and Atkinson 
(1958) has spearheaded considerable investiga- 
tion of the relationship of various indices of 
fantasy of, or desire for, achievement to more 
direct measures of accomplishment. While dif- 
fering investigations have involved a variety of 
ways of measuring the individual's expression of 
such a need and have used varied criteria of 
achievement, attention has frequently been drawn 
to the relation of n Ach, as measured through 
thematic analysis, to academic performance. 

McClelland (1953, p. 237) early reported an 
r of .51 between n Ach scores and grade-point 
average among college students. Later studies 
have tended to report lower correlations. Thus 
Morgan (1952, 1953) found r’s ranging from .46 
to —.21 for populations of high school students 
once he had partialed out the effects of intel- 
ligence. Studying college students he reported 
significant differences, with those students high 
in n Ach showing better grades. These latter 
findings were not translated into a correlation 
coefficient however. Ricciuti (1954) reported no 
significant r at the college level, although he did 
report positive r’s ranging from .16 to .29 among 
high school students, even after the effects of 
intelligence were partialed out of his data 

Doubt is cast on the nature of any generality 
which may be made regarding the relation of 
n Ach as measured through psychometric or 
thematic methods to other behavioral criteria of 
achievement by the studies of Lazarus, Baker, 
Broverman, and Mayer (1957) and Broverman 
Jordan, and Phillips (1960), where inverse 
relationships are reported between the two 
variables. These latter authors prefer a “sub- 
stitution” or “alternative channels” hypothesis 
to one which would see fantasy as a predecessor 
to action. Thus the literature provides examples 
of both positive and inverse relationships between 
the criterion variables as well as studies re- 
porting negligible relationships. Only scant at- 
tention seems to have been paid to attempts 


1 Based on paper presented at the Western Psy- 
chological Association, Seattle, Washington, June 


16, 1961 


to reconcile the conflicting reports, with only 
Broverman et al. basing their efforts on ar 
acceptance of all the findings as legitimate within 
their individual This problem will be 
returned to later in the present paper 

The purpose of the present paper is to report 
two studies relating n Ach as measured by Mc- 
Clelland’s (1953) test of Achievement Imagery 
to academic performance. When the first of 
these studies produced findings contrary to the 
initial hypothesis of a positive relationship, the 
second study was carried out to test one pos- 
sible explanation. Both studies produced an in- 


settings 


verse relationship between n Ach and grade-point 


average in college males 


Stupy I 
Hypothesis and Methodology 


The first study 
division students, numbering 23 in 
One group consisted of those students who had been 
selected for participation in an all-college honors 
program, on the basis of outstanding performance 
during the first two years of college. Selection was 
by action of a faculty committee, following a study 
of the candidate’s grade-point record, his scores on 
the College Entrance Examination Board tests, 
and impressions gained from a brief interview. All 
candidates had been nominated for the program by 
other members of the faculty and had expressed an 
interest in participating 

4 second group was selected to match this group 
of superior achievers on the basis of years in college, 
sex, major field of study, and College Board scores 
This second group was selected from students who 
had maintained average grade records, neither being 
dangerously low, nor outstandingly high 

The hypothesis of the first study was that the 
students would show higher scores for 
Imagery (AI), based on McClelland’s 
than would be true for the 


involved two uppe! 


group 


groups ot 
each 


superior 
Achievement 
(1953) test, 
students 

All subjects administered the McClelland 
(1953) test in a group situation. The procedure fol- 
lowed that described by McClelland. Four pictures 
were presented on the screen, each picture for 4 
minutes. These were those designated A, G, H, and B 
by McClelland, and were presented in that order 


average 


were 
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4 and G are Cards 7BM and 8BM of the TAT 
Picture H portrays a young man looking up from 
book before him on a desk. Picture B 
hows two men beside a piece of machinery 

provided with four sheets 
of paper, one for Each sheet carried a 
reminder to include past, present, future, and emo 
tional dimensions to the story. The directions were 
those specified by McClelland (1953, p. 98) and an 
emulate the procedures he 
“neutral” atmosphere. Thus 
that the testing was for 


in open 
The subjects were 


each story 


attempt was made to 
suggests to produce a 
advised 
i research study and the procedur 


intended to 


ire subjects were 
was carried out 
in a manner convey seriousness o! 
purpose 

Prior to the scoring of the research protocols, four 
several weeks 


Mc- 


of the present authors practiced for 
scoring the precoded protocols 
Clelland (1953) These reproduced by a 
secretarial assistant and independently by 
each judge, with these four scorings then compared 
in order to arrive at a consensus scoring before check- 
ing this against McClelland’s. When this consensus 
reached a point approximating 80% agree- 
McClelland, the scoring of the research 
scored by the four judges 
Again a consensus was 
story. The mean per- 
consensus ior 


prepared by 
were 


scored 


scoring 
ment with 
protocols, each again 
independently, was begun 
reached on 


eventually each 


centage agreement, prior to reaching 


the research protocols, was 


Results 


The total 
four stories represented his score 
subjects in the two groups were compared. In- 
spection showed no differences at all for women 
a finding compatible with that reported by Veroff, 
Wilcox, and Atkinson (1953). Most of the 
studies reported in this area have dealt only 
with male subjects 

Comparing the achievement imagery scores in 
the two groups of males by the Wilcoxon 
matched-pairs method, revealed a difference sig- 
nificant at beyond the .01 level. The males in the 
wverage group had significantly higher AI scores. 
There were 15 males in each group 

Studying the percentage of stories in which 
Al appeared, suggested that for the average 
achieving males, the frequency of such appear- 
ance was about what should have been expected 
for the neutral testing conditions which had been 
sought. Thus McClelland reports that 
under such conditions 48% of the male protocols 
were scored for AI, while in the present study, 
46% of the male protocols in the average 
achieving group could be so scored. It was the 
low incidence of AI in the protocols of the 
superior males which led to the significant 
findings reported above. Only 28% of the stories 


AI score for each subject on his 


Scores for 


(1953) 
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scoreable 
presumed 


produced by this group contained 
achievement imagery, despite the 
equivalence of the testing conditions 


Discussion 


These results contrast sharply with those re- 
ported by McClelland (1953) and appear to be 
in line with the hypotheses put forth by Lazarus 
et al. (1957) and Broverman et al. (1960). 
Broverman et al. have drawn attention to the 
interpretations subjects may make of a testing 
situation and consideration of this as relevant in 
the present study seemed appropriate. Despite 
the attempted neutral conditions, it appeared 
possible that these might not have in fact been 
in force. The honors program was new to the 
college at the time of the study and the bases 
for admission were still ambiguous to many 
students. It seemed possible that the males in 
the average group might have, in some in- 
been hopeful that the testing would 
be used as a basis for further admission to the 
honors program. While such an assumption faces 
the difficulty that the frequency of AI produced 
approximated that reported by Mc- 
Clelland under neutral conditions, the frequency 
of AI which he reports under conditions specifi- 
cally designed to encourage achievement fan- 
tasies was only 55%. The possibility that some 
of the males in the “average” group of the pres- 
ent study may have hoped that their performance 
might in some way relate to possible later ad- 
mission to the honors program is enhanced by 
the fact that no effort was made to disguise the 
fact at the time of data collection that stories 
were being elicited from those students who were 
already in the honors program. 

Continuing this line of thought, there seemed 
no reason to believe that the males already in 
the honors program could perceive any relevance 
other than to research tied to their participation 
in the study. They had already been granted 
the highest honor available “in course” to a 
student, and could perhaps thus be quite re- 
laxed about the testing. This idea is supported 
by the fact that the frequency with which their 
stories contained AI (28%) very closely ap- 
proximates the 25% figure quoted by McClelland 
(1953) for testing males under “relaxed” 
conditions. 

The results of the first study and the specula- 
tions just outlined made it desirable to test a 
group of students who were at the beginning of 
their college careers and who had as yet no 
feedback as to success in college. This 
the basis of the second study 


stances, 


closely 


became 
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Stupy II 
Hypothesis and Methodology 
The 


except 
their 


second study was 
that the 
ing first week of 
received any evaluations 
Because the first study had shown that all of 

AI had 
and H) the 
stimuli. The procedure 


iY 


tested d 

they had 
eXams, etl 
th 


subjects were freshmen 


college, and betore 
Ol papers, 
c 
the 
to 


administra 


differences in occurred on two of 


was confined 


lor 


pictures (G testing 
of these 
tion was 
One hundred 
this manner 
comparisons 
and grade-point The 
meantime had been scored in the 
ibove, except that only three judges were involved 

I'wo types of comparisons were drawn. The first 
Al scores those with the highest 
grades with with the 
method of comparison took 
and defined academic success 
discrepancy between 
might 
ol 
and 


use 
as before 

and thirty 
At the end 
were made 


freshmen tested in 
of thei: 


between 


were 
first 
their 


Al 


protocols in the 


scores 


averages 


manner described 


compared for first 
emester those lowest The 


second ability into 
ol 
the 


the 


account, in 
grades received 

be 
the 


College 


the 
which predicted 
the basis 


grades 


grade average 
individual 
between 
scores for the 
Based on the 
hypothesized that 
tionship between 


established correla- 
Boar 


on 

tion verbal 

involved 

findings of the first 
there would be an 
AI scores and academic pert 


class 
study, 


inver 


Results 


¢ 


For the group as a whole, Al appeared in 51% 
of the stories, again approximating the frequency 
reported for neutral conditions by McClelland 
(1953). 

Neither method of analysis described above 
revealed any significant differences for the women 
students. As discernible relationships 
among the variables were restricted to the men 


before 


The first method of analysis compared the Al 
scores of those men with highest grades with 
those with the poorest performance. Including 
all of the males from the sample of 130 students 
whose grades exceeded the class average by either 
plus or minus one standard deviation allowed 
only six males in each group. A comparison of 
their AI scores by the Mann-Whitney U test 
showed a difference significant at the .06 level 
in the direction predicted by the hypothesis. The 
higher achieving males had lower AI scores 

his method did not take the 
41 between grades and College Board verbal 


into account 
r of 
scores which was established for the total sample 
Inspection revealed that the 12 men identified 
by the method just described did indeed differ 
verbal abilities. Using the 
fl,a 


in their measured 


regression equation derived from the r of 
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predicted grade s established 


rachiever 


An 


xceeded 


ovi 
his 


me tandard error ol 


lor each person 
defined a predicted 


was 


one 


On 


underachiever as 


by a like amount 


to identify 5 mak 


this basis it was possible 


overachievers and 13 male underachievers. Creat- 


ng eight ied pairs and using the Wilcoxon 


method as lal study revealed dit 


lerences sig Lhe 6 ievel in the dir 
the present hypothesis 


Al 


tion predicted \ 


scores 


overachievers tended to have lower 


iderachievers 


Discuss 


pies 


nificance fell short of those desired 


conclusion, the direction of findings 
pletely compatible those reported 


first study. Throughout studies 
ences were found for 


Al 


periormance 


were associat 


scores 


that the 


literature have 


consistent in defining the nature 


ship between scores on tests of achievement 


Uur hn 


ther behavioral criteria 
with those 


magery and « 

are marked conflict reported 
DY McC! lan 53 with those 
Morgan (1952, 1953 


and disagree 


reported by ant 


Ricciuti 


support 


although the latter failed 
strong relationships _re- 
Atkinson (1957) has 


o relate the notion of risk taking 
the and Broverman 


the 


earlier attempted 


ported 
t 
t 


to some 


et al. (1960) 


Al 


may pe ac 


results 
that 
performance 
counted 
relationships may 


gest relationships between 


tasks 


basis 


positive 
in laboratory 
the 
€ explained for the correla- 
Ach and 
Broverman et al 
achievement is a 


for on same that inverse 
Situations 
suggest that iif 
substitute 


major 


stimulate 


tion of n scores real 


Specifically 
the 


lantasy ol lor 


oriented behaviors life 
the 
achievement in the less critical laboratory-task 


It appears that considerably 


achievement in 


endeavors same fantasies may 


setting more may 
be needed by way of insight into the perceptions 
that subjects have of the situations in which they 
the id nat 
consistencies, if consistencies 
at all, this area 
research, It is tem that in 


first study 


find mselves before the val ire Oo 


indeed such exist 


lor ol 


the 


males 


determined 
to 
here 


may be 


ting assume 


reported the superior 
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found themselves in a position where, despite the 
atten pted neutral conditions they could “afford 

be “relaxed” and uninvolved in a way that 
did not upply to the other subjects of the study 
and that this situationally set 
relevant than the attempted neutral set 
sought by 


From such reasoning came the second study 


determined was 
nore 


instructions 


However, it now seems legitimate to inquire into 


which may have 


special tactors been acting to 
a rather atypical 
Moulton 


suggested that 


first week ot college 


for testing such as was Cone here 


Atkinson (1957) 
failure may in 


have 


ind 
cases De a more 


than a 


some 
iccurate predictor ot more 
positive desire to the 


data in the second study allows the hunch that 


pertormance 


excel. Casual analysis of 


perhaps the first week of llege 
period in which concerns over possibie 
may be particularly strong. Such a 
being investigated at present 

It is possible, moreover, that al 
in the McClelland (1953) test 
same type of imagery. The 
the present study were based on two of 
pictures and these two may cal 
fear of failure. 

Studies in the 
paper have ranged over a 
for both independent 
variables. The conflicting results may 
diversity and raise questions as to 
common construct is 


as well as the equal adequacy 


may be a 
rauure 
possibility is 


four pictures 
elicit the 


results 


do not 


negative 


general area covered by this 


variety ot ¢ 


riterion 
and dependent 
reflect such 
whether a 
being dealt with through- 
with which 


; 


measures 


out 
various measures would tap such ; 

It seems equally likely 
findings are a function of 
not 


construc 
that discrepancies in 
the operation of dif- 
determined DY the 
instructions, but an outgrowth of the varied 
circumstances in which asked to 
participate, and the various subjects used. Thus 
it is suggested in the first study reported here 
that the directions did not 
create the for both groups of 
subjects. The second study may 
different 


ferent sets necessarily 


subjects are 
necessarily 


male 
have found the 


Same 
same sets 


respondents in yet a situation, one in 
which the omnipresence of a 
challenge, prior to 
adequacy to meet the challenge 
fear of failure more than a more 
tive striving to achieve. 

If these viewpoints are valid, then future re- 
search in this area may have to give more detailed 


rigorous academic 
the student’s 
may have left 


“proot ol 


salient posi- 
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attention to the set of the than has been 
done in the past 
that this 
controlled merely by 
tend the 


subject 
It appears overly simplified to 
ime variable can be automatically 


iss 


the instr tions which at 


experiment 


SUMMARY 


In the study 
tempted to establish the 
McClelland’s (1 
imagery 
the first 
than 


present two experiments al 


re lationship between 


scores on test of achieve 


ment and academic performance. In 


superior male students had lower scores 


In the 


similiar 


the 
the 


level of 


average male students second 


findings although 
reach an acceptable 
In neither study were the 
significant for students. It is 
that further must attend mort 
to the 


direction of 
did 


significance 


was 
findings not 
findings 
women suggested 
research closely 
which the 


, } ‘ 
perimental situation 


set subject bring to the ex 
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THE RELATIONSHIP OF ANXIETY, SELF-CONCEPT, 





AND 


SOCIOMETRIC STATUS AMONG FOURTH, FIFTH, 
AND SIXTH GRADE CHILDREN 


FRANCES DEGEN HOROWITZ 


Bureau of 


The role of anxiety as a variable in human 
behavior has long attracted the attention of the 
psychologist and psychiatrist. Following the pub- 
measuring scale of 
(Taylor, 1953) a similar 
measure for children (Castaneda, McCand- 
less, & Palermo, 1956) was developed: The 
Children’s Manifest Anxiety (CMA) scale. The 
CMA scale has been used in the experimental 
investigation of learning involving drive as a 
(Castaneda, 1956; Castaneda, Mc- 
Candless, & Palermo, 1956) but has been more 
widely used to study anxiety as a variable in 
personality development and social behavior. 
McCandless, Castaneda, and Palermo (1956) 
early reported negative correlations between 
anxiety and sociometric status among fourth and 
fifth graders such that the more anxious children 
tended to be less popular. The significant cor- 
relations they obtained ranged from —.28 to 
-.75. Sixth graders in their sample showed no 


lication of an objective 


anxiety for adults 


variable 


significant relationship between the two variables. 


(1957) reported similar 
findings among teenage delinquent boys, in- 
dicating a correlation in the neighborhood of 
.29. He listed seven supporting bibliographical 
indicating general corroboration in 
research using other indices of 


Subsequently, Trent 


references 
sociometric 
anxiety. 
The relationship between anxiety and self- 
concept has also been investigated. Lipsitt (1958) 
developed a children’s self-concept scale for use 
with upper elementary school subjects and re- 
ported a negative correlation between the CMA 
scale and the self-concept scale among fourth, 
fifth, and sixth graders. Significant correlations 
ranged from —.34 to —.63. Coopersmith (1959) 
also found a similar relationship. In addition he 


' This article is based upon a paper read at the 
1961 meeting of the Society for Research in Child 
Development at University Park, Pennsylvania. The 
study was conducted while the author was at South- 
ern Oregon College. A grant from the Southern 
Oregon College Research Council supported parts of 
the statistical analyses reported here. The author 
wishes to express her appreciation to the teachers of 
Lincoln School for their cooperation in the conduct 
of this study. 


Child Research, 


University of Kansa 


and sixth graders in 
34) correlation 
Thus it 
tends to be as- 


fifth 


positive 


reported that fourth 
his sample showed a 
between self-esteem and popularity 
would that high anxiety 
sociated with poor self-concept and low socio- 


The present study sought to in- 


seem 


metric status. 
vestigate these relationships on a sample of upper 
elementary school children in Oregon, measuring 
the three (anxiety, self-concept and 


sociometric status 


variables 
within one study 


METHOD 


To 40 fourth graders, 51 fifth graders, and ; 
graders (11: in all), classroom teachers administered 
the CMA scale, the Children’s Self-Concept Scale 
(SC) and a ranking sociometric (SOC). In the socio 
metric task girls ranked female classmates, 
ranked male classmates. Each subject received a 
sociometric score which was arrived at by 
all the ranks given him by the other children of his 
own sex. The score represented the most 
popular child, the highest the least popular. How- 
ever, for the sake of clarity, the signs of the correla- 
tion coefficients have been changed so that a nega 
tive correlation with anxiety means that there is a 
tendency for the more anxious children to be less 
popular and a positive correlation with self-concept 
means that there is a tendency for children with a 
higher seli-concept to be more popular. With one 
exception, all scales were administered in one after- 
noon and repeated a week later. The exception was 
the failure to administer the sociometric task the 
second time in the sixth grade class. The above n’s 
exclude subjects who did not complete all three 
scales or whose reading abilities, as indicated by 
teachers, insufficient to handle the 
cabularies 


SIXth 


bovs 
averaging 


lowest 


were scale vo 


RESULTS 
Test-retest reliabilities are indicated in Table | 
In most cases they are relatively high, and in al 
Table 2 are the Pearson cor- 
each 


cases significant. In 
relation coefficients for girls and boys at 
grade level, indicating the relationships between 
anxiety and self-concept, anxiety and sociometric 
status, and self-concept, and sociometric status 
No sex differences were found and sexes at each 
grade level were combined giving the resultant 
n’s and correlations in Table 3. Five of the nine 
correlations were significant and one approached 


significance 
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rABLE 1 


Test-Retest RELIABILITY COEFFICIENT 


be seen that anxiety and self-concept 
showed consistent negative correlations indicating 
high anxiety tended to be associated with low 
self-concept. This finding replicated Lipsitt’s 
(1958) results. Consistent negative correlations 
were also found between anxiety and sociometric 
status such that high anxiety tended to be asso- 
ciated with low sociometric status. This result 
replicated the McCandless et al. (1956) study 
Consistent positive correlations between socio- 
metric status and indicated high 
status was associated with high self- 


self-concept 
sociometric 
concept 

Using Fisher transformations, the 
tions between grades were not found to be sig- 
nificantly different and were combined, yielding 
n’s of 111 and the correlations shown in the first 
column of Table 4. These correlations bear out 
the relationships just cited. The partial correla- 
tions are also shown in Table 4 and suggest that 
the Pearson r’s are not spurious, except perhaps 
for the correlation between self-concept and so- 
ciometric status 

Because the 
correlations were relatively low 


correla- 


values of the obtained 


an attempt 


absolute 
was 


TABLE 2 
PEARSON CORRELATION COEFFICIENTS 
Grade CMA-S 
Fourth 
Fifth 


Sixth 


unt at .05 level 
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TABLE 3 


PEARSON CORRELATION COEFFICIENT 


. 


race t CMA-SOK 


Fourth . 4" 
Fifth 5 18* 
Sixth 2 44" 


* Sex combined 
* Significant at .01 level 
TABLE 4 


AND MULTIPLE CORRE 
COEFFICIENTS 


PEARSON, PARTIA ATION 


48** 
36** 


32** 


Note Sex and grade combined. N 

*0 = SOC l CMA scale, 2 of 
* Significant at less than the .05 level 
** Significant at less than the .01 level 


made to determine whether or not the coefficients 
of multiple correlations would increase predictive 
efficiency. Specifically, could each of the variables 
be predicted more accurately by combining the 
other two variables in a multiple prediction 
equation. The multiple correlation coefficients 
are also shown in Table 4. They are highly sig- 
nificant, but again, relatively low. In fact, the 
magnitudes of the multiple correlation 
cients do not greatly increase predictive efficiency 
over the two variable Pearson r correlations. For 
instance, self-concept correlates with sociometric 
status at .32 and anxiety correlates with socio- 
metric status at —.36. Anxiety and self-concept 
together correlate with sociometric status at .39 
In each case less than 16% of the variance is 
accounted for by the correlations. Similarly 
anxiety correlates with self-concept at —.48, and 
with sociometric status at —.36. Self-concept and 
sociometric status together correlate with anxiety 
at .53. This multiple correlation accounts for 28% 
of the variance; the —.48 Pearson r for anxiety 
and self-concept alone accounts for 23% of the 
variance 


coeth- 


DISCUSSION 


The trends of the present study are clear and 
closely replicate, in one study, trends reported 
by other researchers on widely distributed popu- 
lations generally employing only two of the three 
variables included in the present study. The more 
anxious child tends to have a poorer self-concept 
and to be less popular on a ranking sociometric 
than the less anxious child and the direction of 
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relationship is generally consistent between grades 


and sexes 

While the consistency of trend direction and 
agreement with previous research suggests stable 
phenomena the facts alone do not provide an ex- 
planation. Lipsitt (1958) suggested that subjects 
who learn to think of themselves negatively may 
do so under conditions of high drive and that a 
low self-concept is related to drive level or anx- 
iety. The relationship between anxiety and self- 
concept can thus be explained by appealing to 
the learning history of the individual based upon 
the correlation of two measures involving self- 
statements or self-ratings. The labels applied to 
the two self-rating scales are a priori judgments 
and thus quite arbitrary. But, sociometric status 
reflects ratings of others. The response of the en- 
vironment (classmates) to the individual is then 
related to self-statements. As such 
objective criterion score involved in the correla- 


we have an 


tions involving sociometric score 

However, the more important fact illustrated 
by this study which is in agreement with previous 
research is the consistently low magnitude of the 
In the present study the 
for 


correlations obtained 
highest of 
little more than 37% of the variance. By com- 
bining sex and grade, the highest Pearson corre- 
lation accounts for about 23% of the variance 
Neither the partial correlation nor the multiple 
correlation coefficients seem markedly to improve 
predictive efficiency 

The consistently low correlations suggest that 
for the prediction of any of the variables, either 
other variables must be added to the pool, or the 
present must be broken down and 
ured more accurately. In fact, the variables in 
volved in the present study may actually repre- 
sent various levels of complexity. If anxiety is a 
measure of drive, self-concept may be a measure 
of drive plus some other components, which 
Lipsitt (1958) has suggested. Indeed, if the items 
of the two scales are examined it is obvious that 
the statements upon which the subject rates him- 
self in the self-concept represent more 
nearly habitual kinds of self-evaluative responses 
for children. Or rather, these are, in American 
culture, the kinds of reflections children are en- 
couraged to make concerning themselves: “I am 
“T am honest,” “I am to be trusted,” et« 
They are, almost verbatim, part of the 
codes to which millions of American children 
weekly pledge allegiance. On the other hand, the 


the Pearson correlations accounts 


meas- 


ones 


SC ale 


brave,” 


scout 


AND NOTES 

majority of the statements of the anxiety scale 
are less likely to be ones which American chil 
dren habitually use. For instance, “I get nervous 
when for me,” “I 
about what other people think of me,” “Often | 
“T have to g 
more than most people.” These ar 
which parents overtly use in 
or which they traditionally 
as modes of self- 
discriminating 
Levy 


things do not go right Worrs 
have trouble getting my breath,’ 
to the toilet 
statements 
evaluating children 
encourage their children to use 
In fact, the most 
items of the CMA scale as identified 
(1958) are particularly nontraditionally 


not 


evaluation 
by 
evalua 

tive in nature 
SUMMARY 


scale, the 
ranking 
sociometric were administered to fourth 
fifth. and sixth grade children. Results indicated 
that more anxious children tended to hold poorer 
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Self-Cor 
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Children’s cept 
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self-concepts and tended to be less popular than 
less anxious children. The results are consistent 
with previous research both in direction and mag 
nitude of relationship. It was suggested that the 
low level of the obtained correlations has impor 
tant implications for further research 
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Could you use the mew, 1960 edition of this 
well-known test for brain injury and pervep- 
‘} tual-motor coordination, widely used in clint- 
cal practice and ia research since 1945, with 


ape ty based om Jarge clinical and con- 


trol groups? 
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